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Abstract 





Developments in Science and technology have relied extensively on modeling and pattern recognition. In the present 
work, we describe a framework for modeling how models can be built that integrates concepts and methods from a 
wide range of fields. The information schism between the information in the real-world and that which can be gathered 
and considered by any individual information processing agent is characterized and discussed, which is followed by the 
presentation of a series of the adopted requisites while developing the reported modeling approach. The issue of mapping 
from datasets into models is subsequently addressed, as well as some of the main respectively implied difficulties and 
limitations. Based on these considerations, an approach to meta modeling how models are built is then progressively 





developed. The reference M* meta model framework is presented first, which relies critically in associating whole 
datasets and respective models in terms of a strict bijective association. Among the interesting features of this model 
are its ability to bridge the gap between data and modeling, as well as paving the way to a paired algebra of both data 
and models which can be employed to combine models in hierarchical manner. After illustrating the M* model in terms 
of patterns derived from regular lattices, the reported modeling approach continues by discussing how sampling issues, 
error and overlooked data can be addressed, leading to the MS‘? variant. The frequent and important situation in 





which the data needs to be represented in terms of respective probability densities is treated next, yielding the M<°7* 





meta model, which is then illustrated respectively to a real-world dataset (iris flowers data). Several considerations 





about how the developed framework can provide insights about data clustering, complexity, causality, network science, 


collaborative research, deep learning, and creativity are then presented, followed by overall conclusions. 


‘So much closer on the lake, the new star.’ 


LdaFC. 
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1 Introduction 


Despite the many intricacies that characterize our world 
and the universe, there are some properties that seem to 
underlie in a more shared, fundamental, and systematic 
manner the structure and dynamics of natural phenom- 
ena. Among them, we have the two following intriguing 
facts: (i) every portion of our world is strongly intercon- 





nected and interdependent one another along time and 
space; and (ii) at the same time, there are severe limita- 
tions to the information that can be collected and pro- 
cessed by any computing system, be it natural (e.g. hu- 
mans) or artificial (e.g. digital computers). 

Perhaps necessary and unavoidably, these two impor- 
tant principles directly oppose one another, establish- 
ing an interesting mutual global/local duality or tension 
which may have well been, as will be suggested in the 
present work, at the very core of the appearance and un- 





folding of life and intelligence themselves. On one side 
we have a vastness of global interconnections extending 
through an impressive range of scales, on the other the 
extremely limited resources of every kind available to any 
known information processing system, including living be- 





ings. This intriguing duality is here deemed here to im- 
portant and inexorable enough as to warrant the name of 
information schism which, though related to the semantic 
gap, implies a wider context. 





The level of interdependency of natural structures and 


phenomena is so breathtaking as not to be often realized. 
The following two examples should suffice to materialize 
this issue. First, consider the dynamics of a pendulum on 
a table in front of us. Because of the gravitational force al- 





ways established between any two bodies of mass, and as 





a consequence of this force fields extending to infinity, the 
movement of this pendulum will be affected by every sin- 
gle bit of mass in the universe. Though this influence will 
certainly become smaller and smaller as the distance be- 





tween the masses increases, it will nevertheless be present 





and therefore influence, even as an infinitesimal element 
in the background noise, in limiting full accuracy to be 
achieved for the measurements and predictions about the 
pendulum dynamics. Our second example, which is re- 
lated to interactions along time, consists of the relatively 





smaller and localized events that reverberate through last- 
ing periods of time, as it was possibly the case of the me- 
teor that led to dinosaurs being extinct, therefore allowing 
mammals to evolve. 

Interestingly enough, both the above issues become par- 
ticularly critical in non-linear systems — which encom- 
pass virtually a wide range of real-world phenomena — 
because these can amplify even minute perturbations into 
major effects. Other critical limitations of information 
processing by agents include, but are not limited to, the 
fact that several objects and states are non-observable in 
nature, the limited available time for any action, as well 





as the possible presence of error and noise in every mea- 
surement taken from the real world. 

So it is that nature has somehow self-organized itself 
into a never ending web of interconnections and interde- 





pendences extending along every possible scales of space 
and time. From the perspective of nature itself, this does 
not constitute a problem, because the physical world does 
not seem to require help of external computing resources. 

However, with the appearance of individual agents, 
such as living beings, the information schism acquired 





much more relevance, representing a major challenge to 
be circumvented in some manner. The main problem here 
is that agents — be it natural or artificial, individual or 
collective — can only exchange and process minute por- 
tions of mass, energy and information. As the survival 
and perpetuation of these agents rely critically on their 
interaction with the environment (which may also include 
other agents), so as to make suitable decisions with basis 
on predictions, it becomes mandatory that any of these 
agents incorporates the adequate means for receiving, pro- 
cessing, predicting, and acting on the environment. 





When contemplated from the above discussed perspec- 
tive, the continuing existence of living beings and other 
information processing entities can be realized as being 
truly phenomenal. After some additional reflection, we 
may perceive that the success of individual living beings 


has largely relied on taking timely decisions on what to 
do based on information sampled from the respective en- 
vironment, while also taking into account previous expe- 
riences consolidated through some kind of memory such 
as nervous systems or biochemical dynamics characteris- 
tic of each species. Figure 1 illustrates the basic condition 
of an individual A in an environment £E. 


energy 





Figure 1: An individual agent A in an environment E exchanges and 
acts upon mass, energy and information. The survival and repro- 
duction of this agent depends critically on taking timely decisions by 
processing the information received from the environment and then 





acting suitably on it, while taking in account previous experiences 
stored in some manner. Agents capable of information processing 
can be modeled as incorporating memory and processing capabil- 
ities, which are both finite and require some physical expense in 
terms of mass and energy, being consequently limited. There are 
also constraints regarding the period of time it takes for predict- 
ing and making decisions, and it is important to keep in mind that 
the environment E almost certainly also contains other information 
processing agents representing potential threat or collaboration. 


A more in-depth view of the concept of decision making 
is particularly critical for our formulation. It should be re- 
called that the taking of effective decisions relies critically 
at least on the following aspects: (i) sampling enough in- 
formation from the environment; (ii) taking into account 





previous experience, especially in the sense of possess- 
ing comprehensive understanding of the environment; and 
(iii) identifying the situation as corresponding to some 





instance of already experienced problem; and (iv) hav- 
ing the means for making accurate predictions guiding 
the possible decisions. More informally, models could ben 
understood as magic mirrors reflecting not only the real- 
world properties, but also their respective consequences. 

An intriguing relationship can thus be established be- 
tween the action of taking a decision and the basic princi- 
ple underlying science, namely the construction of models 
by using the scientific method (e.g. [1, 2]). 

The scientific method also relies importantly on obtain- 
ing quantified information about the phenomenon of in- 
terest, as well as previous related knowledge, in order not 
only to better understand that phenomenon, but also to 





make accurate predictions about it. Thus, in essence, de- 


cision making by information processing entities can be 
directly related to the development of models. Thus we 
posit that decision taking by agents is basically the same 
as scientific modeling, sharing not only the same objec- 
tives, but also the mechanisms for achieving results. In a 
sense, a model can be understood as a logic-mathematic- 
computational construct involving conditions that need 
to be satisfied by the observed data. 

Interestingly, a further relationship can be established 
between individual decision taking with the area of pat- 
tern recognition (e.g. [3, 4, 5|). In this area of great cur- 
rent interest, the main objective is, given a set of prop- 
erties or features measured from an object, to reach a 
conclusion about its possible class or category. There are 
two main types of pattern recognition: supervised and 
non-supervised, the former being characterized by avail- 
ability of previous knowledge, examples or prototypes of 
the existing categories, which are not available in non- 
supervised classification. In both cases, and especially 
in the supervised case, a direct parallel can be estab- 





lished between decision taking/ model building with pat- 
tern recognition. The obtention of experimental measure- 
ments in modeling can be directly associated with the 
derivation of properties of the entities to be classified, the 





consideration of previous experience/models is reflected in 
the information available about the classes, and the ob- 
tained prediction can be directly paired with the action 
of making predictions. 

The intrinsic relationship between decision making, 
pattern recognition and modeling can also be inferred 
from the evolutionary perspective that endowed humans 
(as well as other living beings) with these two abilities. 
The point here is that, in case pattern recognition and 
modeling are distinct, they would require different neu- 
ronal and cognitive respective abilities, which is much 
more expensive than sharing the same neuronal resources 
for addressing both these critically important tasks. 

We could go much further because several other ac- 
tions such as urban planning, land management, educa- 
tion, economic and social policies, to name but a few, may 
also be related with modeling, decision taking, or pattern 
recognition. Remarkably, also the field of arts can be re- 
lated to the modeling framework by understanding an art 
piece with a dataset and the model with the conditions 
estimated to be necessary for positive respective appreci- 
ation and/or impact. Ultimately, it becomes difficult to 
find a human activity that can not be somehow related to 
model building and/or pattern recognition. 





The above discussed relationship between decision tak- 
ing, modeling, and pattern recognition probably repre- 
sents the most important and critical characteristic of 
the approach discussed in the current work. For at least 
the following reason: this allows us to incorporate con- 


cepts and methods from a wide range of related scientific 
fields, especially philosophy of science, artificial intelli- 
gence (e.g. [6]) and pattern recognition, discrete math- 
ematics, statistics, physics, complex networks, and data 
science, to name but a few. 

Another important feature of the reported approach 
regards its logic-mathematic-computational formalization 
of the activity of model building through a meta-model, 
which can allow us to better understand and draw more 
objective and general results and conclusions concerning 
the properties, advantages and limitations of model build- 
ing. 

Though new models can be obtained in a never ending 
number of ways, here we focus on a methodological frame- 
work based on logical combinations of the existing models 
while considering set operations between the respective 
datasets. More specifically, instead of logically combining 
models (which can also be done) irrespectively of data, 
the considered method also allows expressing the dataset 
of interest as a combination of other existing datasets, 
then obtaining the sough model as a logical combination 
of the models that can be associated to the respectively 
identified datasets. It is shown that, by establishing an 
bijective association between the datasets of interest and 
respective models it becomes possible to obtain a bridge 
between these two domains, with each set operations be- 
ing used to manipulate datasets becoming bijectively as- 
sociated with a respective logical operation. 

Given the data and model realms, it is possible to start 
with some data of particular importance and then look for 
a model, or vice-versa. An interesting asymmetry seems 





to characterize the development of science through model 
constructing, residing in the fact that though new mod- 
els can be obtained by logic, exact combination of the 
existing models, these models would still need to be as- 
sociated to some real world data, which corresponds to 
its respective physical validation. Interestingly, the pro- 
posed bijective association between datasets and models 
avoids this situation, because every combined model will 
necessarily be associated to a respective dataset. 

In spite of its idealizations in several respects, this first 
meta-model, which is henceforth referred to as the M* 
model, provides a sound reference for better understand- 
ing more realistic modeling through the progressive incor- 
poration of characteristics such as noise, incomplete sam- 
pling and/or characterization, classification errors, etc. 

In this work, the basic overall of the M* framework is 
also extended to address problems like sampling, error, 
noise (yielding the M<‘* meta model), and to take into 
account the stochastic representation of the data in terms 





of respective probability densities, therefore leading to the 
M<? meta model. Both the M* and M<°7 approaches 
are illustrated by respective case-examples. 


To complement work, we discuss how the concepts and 





methods related to the developed modeling frameworks 
can provide insights about areas of great current impor- 
tance including clustering, complexity, collaborative re- 
search, deep learning, and creativity. 


2 Specifying the Problem 


It often happens that the difficulties in developing a so- 
lution to a given problem ultimately derive from lack or 
imprecisions while specifying the respectively sought goals 
and constraints. Thus, it is reasonable to initiate the de- 
velopment of the approach reported in this work by listing 
the many requirements and characteristics that were ini- 
tially specified. 

The main objectives and constraints that have been 
adopted in the currently described approach are listed in 
the following: 


[R1] - Allow the integration of several related concepts 
such as modeling, decision making, pattern recognition, 
etc.; 

[R2] - Integrate the progressive incorporation of knowl- 
edge that characterizes scientific advance; 

[R3] - Accommodate the tension between specificity in 
modeling datasets bijectively and the generality implied 
by every data element in those sets non-injectively satis- 
fying the same associated model; 

[R4] - Allow the representation of data elements in terms 
of respective features (measurements), as it is typical in 
pattern recognition; 

[R5] - Allow the identification of the main challenges in 
modeling and other related areas; 

[R6] - Provide subsidies for better understanding clus- 
tering, complexity, complex networks, ontologies, collab- 





orative science, deep learning and creativity, among other 


possibilities; 
[R7] - Adhere to both model- and data-driven perspec- 
tives; 


[R8] - Lead to an effective modeling methodology that 
can be eventually automated in software and/or hardware 
engines; 





[R9] - Be relatively formal but remain nevertheless acces- 
sible, while also maintaining good didactic potential; 
[R10] - Allow the incorporation of stochasticity related 
to dataset and modeling; 

[R11] - Allow the incorporation of the tuning role of pa- 
rameters in scientific modeling; 

[R12] - Pave the way to compositions of models, in the 
sense that the modeling results can be feedbacked as input 
or into other modeling systems; 

[R13] - Be as congruent as possible with the human un- 


derstanding of modeling and pattern recognition, as well 
as many of the involved concepts; 

[R14] - Account for the fact that modeling and pattern 
recognition depends on sets of available or selected data, 
varying in size and generality from a few data elements 
to the whole physical world; 

[R15] - Allow multiple data elements to be queried si- 
multaneously, as motivated by modeling, and still provide 
good performance when applied to single individuals. 


3 Mapping Datasets into Models 


As approached in this work, the basic operation in mod- 
eling is considered to be the mapping of datasets into 
respective models. As such, it is important to discuss this 
operation in more detail, which consists the main objec- 
tive of this section. More specifically, we will develop a 
reasoning allowing an bijective association to be estab- 
lished between the datasets and the respective models. 
Recall that, mathematically, an bijective association con- 
sists of an binary association of elements belonging to two 
sets that has the properties of being reflexive, symmetric 
and transitive. bijective associations are particularly im- 
portant because they can be understood as implementing 
a network of causal relationships between the several in- 
volved components. 

For simplicity’s sake and for all subsequent purposes in 
this work, this type of relationship may be understood as 
establishing a zdentity or bridge between the dataset and 
model domains. 

Other important issues include the understanding of 
how parameters can be accommodated into models, the 
need to quantify the properties of the data elements into 
respective features or properties, as well as the several pos- 
sible types of models, not to mention the several manners 
in which models can be progressively developed. There- 
fore, it is hoped that the concepts and discussions devel- 
oped in this section contribute a sound basis for building 
the sought meta model, as well as for identifying and dis- 
cussing the possible limitations while mapping datasets 
into models, which will be addressed in the subsequent 
section. By meta modeling it is henceforth understood 
the endeavor of modeling how models are built and de- 
veloped. 

We start by presenting, in Figure 2, four types of map- 
pings that may take place between the datasets w; in the 
environment E and the respective models in the model 
framework M. 

In Figure 2(a) we have a non-injective mapping, in 
which more than one dataset w of E are mapped into the 
same model m in M. Though it can be understood that 
both these datasets are explained by that model, it is im- 


(b) 





(d) 


Figure 2: Four possible types of mappings, here understood in the 
mathematical context, from datasets in Æ into models in M. (a): 
non-injective mapping, meaning that more than one dataset is as- 
sociated to a same model. (b) non-surjective mapping, in which 
one model in M is not verified for any dataset in EF. (c) The bijec- 
tive situation in which each dataset is associated to a single model, 
and all models end up associated datasets. The situation in (c) 
is critically important because it means that the mapping can be 
inverted. Formally speaking, though the mapping shown in (d) can- 
not be characterized as a function (though this restriction in relaxed 
in some approaches), it does represent a relatively common situa- 
tion while mapping data into models and can be easily addressed 
by merging those datasets that map into the same model, yielding 
an injective map. 


possible to distinguish between the two original datasets 
from their respective image. A non-surjective mapping is 
illustrated in Figure 2(b), in which some of the models in 








M have not been verified respectively to any of the ex- 
isting datasets in Æ. This situation could be informally 
understood as a “model in wait for a dataset”. This sit- 
uation is also unwanted because we have models that are 
not verified. The situation depicted in Figure 2(c) corre- 
sponds to a bijective mapping between the elements of E 
into M, being therefore invertible unlike the two previous 
situations. 

The critical importance of adopting a bijective, invert- 





ible mapping of datasets into models resides in the fact 
that this type of relationship both avoids the ambiguity 
of a non-injective mappings as well as the existence of 
unverified models. 

A forth situation is worth consideration, and it has to 
do with mappings that, by not adhering to the usual 
concept of mathematical function, allow more than one 
model to be associated to a same dataset, as illustrated 
in In Figure 2(d). 
understanding that those multiply satisfied models actu- 


This situation can be addressed by 


ally correspond to the same model (giving rise to a subset 
of features), yielding a respective injective map. As we 


will see, this type of mapping is also relatively common 
regarding data elements, but cannot occur when an bijec- 





tive association is to be established between datasets and 
models. This situation may also be caused by insufficient 
sampling of data or errors. 

Interestingly, while in our approach the mapping be- 
tween datasets and models is henceforth understood as 
an bijective association, all the data elements inside each 
dataset w; map in a non-injective manner into the same 
model m;, being therefore not subjected to an bijective 
association. At the same time, any data element may 


belong to more than one dataset. 





The fact that greater freedom of mapping is allowed in 
the case of data elements is actually welcomed because it 
disentangles the seemingly opposite requirements in mod- 
eling and pattern recognition respectively to having speci- 
ficity of models regarding whole sets of data, but general- 
ization of models with respect to individual data elements. 
From the pattern recognition perspective, it means that 
all datasets in a given w; belong to the same category de- 
fined by the respective model m;, which seems to be quite 
reasonable. 

Scientific models can be understood as involving vari- 
ables, constants, and parameters, among other possible 
components. Variables include all quantities that may 
vary during an experiment; constants refer to quantities 





that never vary during or between experiments; and pa- 





rameters are quantities that may vary from an experiment 
to another. Variables are often subdivided as being de- 


pendent and independent (or free). As implied by its 





name, a variable is said to be dependent in case it is ex- 
pressed in terms of the others in a given model. It is inter- 





esting to observe that the concept of variable dependence 
is relative to each specific model, because a variable that 
is dependent in one case my be independent in another 
model. 

In the case of a simple pendulum, we have time as a free 
variable and the angular position and speed as variables 
dependent of time. The mass of the bob and the length 
of the rod correspond to parameters. The gravity accel- 
eration can very probably be taken as a constant, given 
that it is difficult to change its value in a laboratory. 

These three main types of modeling elements can be im- 
mediately associated with pattern recognition concepts: 
variables are the measurements (or features) of the data; 
constants are constants, while parameters correspond to 
adjustments influencing the measurements or decisions. 
For instance, in a neuronal network the parameters would 
correspond to the weights and bias of each neuron, or 
could refer to the smoothing level adopted while simpli- 
fying images. In a physical model, the parameter tuning 
allow a specific phenomenon to fit the respective model. 

The proper setting of parameters is critical for model- 


ing and pattern recognition, since they directly influence 
the decisions. In the present work, we understand that 
the parameters are always adjusted so as to guarantee 
the bijective association between datasets and respective 
models. This adjustment can be made through some op- 
tumization procedure, varying the parameter values so as 
to lead to no errors in the decision. The more frequent 
situation of possible decision errors will be addressed in 
Section 4. It is also possible to search for suitable param- 
eters value during data analysis and model building. 

So far, we have assumed that the data elements in each 
dataset can be directly operated by the model in order to 
verify the respective adherence. However, the practical 
analysis of a given dataset by a model is, in general, im- 
possible unless the elements in this dataset have been first 





properly represented in terms of a set of categorical or 
quantitative features, also called measurements, charac- 
teristics, attributes, and properties, in the pattern recog- 
nition area. Therefore, a further level of mapping needs 
to be incorporated into modeling and pattern recognition, 
extending from datasets into feature sets. 

The diagram in Figure 3 illustrates how a given dataset 
wi can be mapped into respective features fj, J 





1,2,...,m, which are then sent to the respective model 
Mi. 





Figure 3: The modeling or recognition of a dataset w; derived 





from a respective universe Q requires each data element to be first 
mapped into a set of categorical and/or quantitative features fj, 
j = 1,2,...,m, which defines in a bijective manner a new dataset 





wi. For simplicity’s sake, the feature-associated datasets w; will not 
be shown in the other diagrams in this work. 


Observe that the description of a dataset w; therefore 
give rise to a transformed version of that dataset w;, upon 
which the models can now objectively operate. For sim- 
plicity’s sake, the latter type of datasets will be omitted 
from other figures and diagrams of this work, but they 





will be nevertheless understood to be present. 

It is not often realized that features are always present 
in typical decision, modeling and recognition tasks, in- 
cluding the measurements representing the very entities 





of interest. For instance, before we can decide that the 





presented entity is or not a dog, it needs to be trans- 
formed into an image by our visual systems, or typically 


transformed into respective matrices in an artificial sys- 
tem. Features can be transformed and combined in end- 
less manners, but the results can always be understood as 
features. 

As we learn from the pattern recognition area, it often 
constitutes quite a challenge to select a proper set of fea- 
tures describing the analyzed entities. Observe that the 
number of features involved in a model defines a respec- 
tive multidimensional feature space. Ideally, each data el- 
ement should be mapped into a common feature space in 
a bijective manner, so as to establish an bijective associa- 





tion between the data elements and their representations 
in terms of the considered features. It is also particularly 
important to identify the smallest set of features that may 
allow a problem to be reasonably solved. In Figure 3, 
though all features have been connected through bijective 
arrows, it may also be the case that each of the features 
alone is not bijective, but that the set of all considered 
features will provide a bijective association. 

The choice and number of features required for estab- 
lishing a bijective mapping between data and models also 
depends on the existing data as well as on the intrinsic 





characteristics and level of heterogeneity among the data 
elements. For instance, in case the existing elements are 
very similar, more features will be required. Alternatively, 





more heterogenous datasets tend to favor fewer features. 
Though the features are assumed to provide a com- 





plete, invertible representation of the datasets in our first 
approach (M*), which is necessary to maintain the bi- 
jective association between data elements and respective 
models, it is also possible to subsequently adapt this same 
model for situations when the features no longer provide 
an invertible mapping with the data elements. For consis- 
tency of modeling, we also assume that all data elements 
in E are always characterized in terms of every consid- 
ered and applicable features. Observe that the fact of a 
data element not allowing the derivation of a feature con- 
sidered in a given existing model m; may automatically 
eliminate the possibility of that element satisfying that 
model, in case the dataset cannot be described in terms 
of other features. At the same time, a feature could be 
missed that represents the only manner to discriminate 
between two distinct datasets. 





We have so far addressed several points related to the 
data elements, datasets, types of mapping of the latter 
into models, and features. Now, we approach the model- 
ing level itself. 

It is important to keep in mind that any model can be 
immediately associated with a decision or categorization, 
namely that of the dataset satisfying or not the model, 
or to which an extent it adheres to the dataset. There 
are several types of possible models/decisions: thresholds, 
rules, equations, descriptions, etc. Any of these may be 


involved in the henceforth considered modeling. 

It is also interesting to divide the possible models into 
two major groups: (a) those that seek an optimal (mini- 
mum or maximum of some merit figure); and those aimed 
at achieving a given property within a reasonable margin 
of accuracy. While the former type of problems is directly 
related to the ample and important area of mathematical 
optimization, the latter involves defining some margin of 
tolerance and working with probabilities. 

Merit or fitness figures can be associated to each ob- 
tained model, reflecting the requirements specific to each 
problem. Possible merit figures include the length of the 
model description, its intelligibility to humans, and the 
cost of checking if a dataset satisfies a model, among many 
other possibilities. A particularly interesting objective is, 
given a new dataset, to find the largest dataset entirely 
containing it, as this would account for the most general 
explanation of that dataset. In this case the larger dataset 





will nevertheless have to be restricted if one wants to keep 
the bijective association. Also, unless the smaller dataset 





has some special significance, it could be therefore sub- 
sumed into the larger model. Several such simplifications 
and specifications are allowed by the proposed meta mod- 
eling approach. Another particularly interesting situation 
concerns, given a set of datasets to find interrelationships 
between them. 


4 Limitations in Mapping 


Datasets into Models 


Several components of our meta-modeling approach have 
been presented and discussed in the previous section. 
Now, we address some of the most common types of lim- 
itations and constraints related to those components. 

Given a dataset w; of interest, it is possible one or 
more its data elements to have been assigned by mistake, 
or that other data elements be missing. In these cases 
we will have a dataset that does not fully correspond to 
our expectations. Let’s illustrate this situation in terms 
of the following example. Let w be a dataset that has 
been singled out for modeling as a consequence of hav- 
ing its data elements associated with a posited new plant 
species. Spurious samples from other species may be in- 
cluded in w, while other samples of the considered species 
are overlooked, e.g. by some sampling procedure. These 
situations will imply in inconsistencies leading to incorrect 
model being identified for that dataset, and probably lead 
to less accurate and incomplete model identification and 
combination. 

Missing data elements are characteristic of the sampling 
that is unavoidably required in case of infinite or too large 
sets of data elements. 


Errors may also occur while mapping datasets into fea- 
tures. ‘These may include mistakes, finite resolution or 
noise while measuring the features. It is also possible that 
the equations or program used to estimate the features is 
intrinsically incorrect, leading to improper characteriza- 
tion. Errors taking place while measuring or calculating 
features can severely impact the identification of a valid 
model for the given dataset. Another related problem 
concerns the fact that a feature that is critically necessary 
for obtaining a model for a given dataset is overlooked or 
unknown. 

Another possible source of errors takes place at the 
modeling level itself. Here, we may have inconsistent de- 
cisions defined in terms of the features, logical errors, or 
the overlooking of some important features. 

An important type of error not often realized in mod- 
eling and pattern recognition is the situation in which 
some of the data elements in the data environment FE 





have not yet been checked respectively to every existing 
model, which may also undermine the obtained results. 


5 The M* Meta-Model 


Having discussed some of the main aspects and compo- 
nents involved in mapping from datasets into models, as 
well as possible respective limitations, we are now in posi- 
tion of developing a more principled and relatively formal 
meta-model that can account for as many of the require- 
ments listed in Section 2 as possible. 

We start with the overall structure depicted in Fig- 
ure 4, involving a finite number of possible data ele- 





ments represented as a universe set Q. These basic data 
elements x; are henceforth assumed to be finite, with 
j = —1,2,..., No. Observe that the largest possible envi- 
ronment E corresponds to the power set of Q, containing 
2( No) subsets. 

The observable, or available, or restricted set of sub- 
sets of Q are understood to constitute the environment FE 
upon which models can be built, being therefore accessi- 
ble as a set of datasets w;, i = 1,2,..., Nu, each of which 





are therefore composed by data elements. The initial con- 
figuration of a given model framework can be associated 
to the respectively assumed postulates or hypothesis. 

The data elements in Q can be equiprobable (exist in 
the same number) or not. Interestingly, both cases can be 
identically addressed by the proposed framework, though 
the non-equiprobable case will imply in some data ele- 
ments to be less likely (taking a long time) to be incorpo- 
rated into &. Probabilistic situations can be approached 
by using the M<°7 framework to be described in a sub- 
sequent section. 

Observe that a same data element may appear in more 


Figure 4: The M* meta model, decision taking, and pattern recog- 
nition. The universe set Q contains all possible data elements (small 
green circles) xj, which can be successively drawn into environmen- 
tal datasets w;, i = 1,2,..., Nw, so that w; C Q, which define the 
current data environment E. Each of these datasets may eventually 
become associated to a respective model m; explaining necessarily 
every possible element of w;. The set of existing models is under- 
stood to correspond to the modeling framework M. The adherence 





between the data elements in the available datasets and the existing 
models needs to be continuously updated in order to ensure overall 
consistency. ‘The critically important bijective association between 
datasets and models precludes one dataset of being associated to 
more than a model, and vice versa, but this case can also be easily 
accommodated into the M* framework. 


than one dataset, as it individually may satisfy more than 
one model. This property is reasonable and compatible 
with our concept of modeling and recognition, because 
a same entity can indeed satisfies several models. For 
instance, a cat is a mammal, but also a mammal, and it 
has a tail. Each of these decisions are normally taken as 
valid categories, though we may be particularly interested 
in some more specific or general property. 

The basic component of the M* approach, which is 
henceforth called cartouche, is illustrated in Figure 5. 

Observe that the cartouche concept encapsulates a ba- 
sic important principle in the scientific method, that a 
model is valid until it is found not to work for just a 
single experiment (data element). The cartouche also in- 
trinsically integrates the two basic domains that are es- 
sential for both scientific modeling and pattern recogni- 
tion, namely datasets and models. From the perspective 
of scientific modeling, this property emphasizes the im- 
portance of theoretical constructions to adhere to existing 
experimental data, as well as on the essential role of bijec- 





tive representation of data in terms of features. From the 
point of view of pattern recognition, emphasis is placed 





Figure 5: The cartouche. Corresponding to the basic component 
of the M* approach, the cartouche involves a set of data elements 
£i, j contained in a dataset w; belonging to an dataset environment 





E, and a respective model m;. The bijective pairing between the 
dataset w; and its respective model m; requires that the model is 
satisfied (True output) whenever every data element in w; satisfies 
the logic condition corresponding to the model. Observe that some 
of the data elements are allowed to map into other models, and that 
the maps fed from other models into m; have no effect on the model 
validity. However, only the model m; can be associated bijectively 
with the dataset m;, implying that the latter cannot map completely 
into any other model, but if it does so, the two models can be deemed 
to be equivalent. All the above conditions can be summarized in the 
mathematical statement (wi, Mmi), expressing that there is a bijective 





association between the dataset w; and the respective model m;i, 
which is therefore understood to fully explain that dataset. Though 
features level has not been included in this diagram for simplicity’s 
sake, more information about this important level can be found in 
Section 13 . 


on the importance of obtaining models for the data. 

The bijective association between the current dataset 
E and the respective modeling framework M can be ex- 
pressed as (E, M), being henceforth called the current 





theory. The cartouche can also be immediately extended 
to reflect that the interconnection between datasets and 
models is typically accomplished through sets of features 
F, yielding the triple (E, F, M). 

It is also interesting to defined subsets of models in 
order to organize distinct modeling approaches. For in- 
stance, one can organize the existing models in terms of 
the mathematical methods (e.g. derivatives, set theory, 
integrals, delays, etc.) or the type of used features. 

Figure 6 presents a zoom of a hypothetical modeling 
situation, illustrating some of the important features re- 
garding the association between the dataset elements and 
the respective models. 





Figure 6: Zooming into an hypothetical situation illustrates the 
fact that all data elements (green) in a dataset w; necessarily map 
into the same model m; in a non-injective manner. Each of the 
arrows extending from a data element x; to a respective model 
M; indicates that x; satisfies m;. At the same time, a bijective 





mapped is ensured between each dataset and its respective model. 
This feature of the proposed framework is essential for disentangling 
these two seemingly conflicting characteristics of modeling. Though 
each of the data elements in a pair (w;,m,;) does satisfy the model 
Mi, this model is only understood to be fully satisfied with respect 
to all the elements in w. The figure also shows that a same data 





element belonging to one of the existing datasets can map into more 
than one model. The features layer has been omitted for simplicity’s 
sake. 


At each instant, the suggested meta model is under- 
stood to incorporate Nyy models m;, 1 = 1,2,..., Nm. 
Each of the datasets w may become associated to one 
and only one respective model m, being henceforth un- 
derstood that every element of w will satisfy the respec- 
tive model m, and vice-versa, so that an bijective asso- 
ciation is consequently established between each dataset 
Observe that this aspect of 
the M* framework actually defines two scales or levels of 


and the respective model. 


modeling, one at the data element level, and another of 
higher hierarchy at the dataset level. Also, observe that 
the mappings from data elements of a dataset w; that do 
not lead to the bijectively paired m; are not considered at 
this higher level of associations, but only at the lower data 
element level. In this way we accommodate the fact that 
one of the data elements in a dataset may individually 
satisfy more than one model, but without implying the 
dataset to which it belongs be assigned to other models. 
In other words, the association between a dataset and a 
model takes place only when all its data elements satisfy 
that model. 

The current set of available datasets w; is henceforth 


called data environment E, while the existing models are 
henceforth understood to constitute the modeling frame- 
work M. The set containing all elements in any of the 





datasets w; of E is henceforth represented as Sg. 

Taken jointly, these two sets may be related to the 
concept of current knowledge. Recall that at any time, 
new data elements can be drawn from Q and define new 
datasets that can eventually assume enough importance 
in order to become subject of respective modeling. Exam- 
ples of this possibility include but are by no means limited 
to the appearance of a new species of living beings, the 
discovery of new stars, the birth of new individuals of 
a given species, and the invention of new technological 
devices. As addressed in more detail in Section 6, it is 
also possible that two or more existing datasets (or pairs 
dataset-model) be combined through set operations such 
as union, intersection, complementation or difference. 

In order to ensure consistence of proposed framework, 
the datasets are updated continuously, in the sense that 








any new element drawn from Q is checked respectively 
to each existing model and incorporated into the associ- 





ated dataset in case it satisfies that model before eventual 
combination of models can be contemplated. In addition, 
all the existing data elements are continuously checked 
respectively to any new incorporated model. 





A more complete, expanded representation of the meta 
model M* is depicted in Figure 7, also incorporating the 
features associated to each dataset. 

Observe that the bijective association between datasets 
and respectively associated models is maintained by the 
new feature layer F, being reflected in the confluence of 
each dataset into a single model even if multiple features 





were involved in the respective representation. In addi- 
tion, observe that a same type of feature may be adopted 
for the characterization of more than one dataset. The 
incorporated layer consisting of features is henceforth un- 
derstood to constitute a new layer called the feature layer 
F. 

Given a bijective pairing (wi, Mmi), a decision can be 
immediately assigned to the model, corresponding to its 





respective verification by the dataset. Therefore, each 
model corresponds to a question or decision, a feature 
that is directly related to the concept of causality. 

There are other interesting situations that can also be 
taken into account while integrating data and models into 
the current framework. For instance, when a new dataset 


w; is found to be entirely contained in one of the existing 





datasets w;, therefore being a subset of the latter, a re- 
striction of w; can be assigned to w;. For instance, let’s 
say that w; = {1,2,3,4,5,6,7} and w; = {2,3, 4}, the new 
dataset can be paired with a respectively restricted model 
corresponding to the set operation w; = wj — (wj — wi). 
Interestingly, this sequence of reasoning implies recursion, 





Figure 7: The meta model M* expanded to incorporate the features 





respectively describing each of the datasets. It should be recalled 
that models can only operate on data that has been categoric or 
quantitatively specified in an objective manner. In addition, more 





than one dataset be characterized in terms of a same type of feature. 


which can be approached by initially assigning a provisory 
dummy model m;. 

In fact, the association of a model to a dataset can be 
done as soon as the dataset is given, through a labeling 
procedure that simply assigns an identifier (without se- 
mantic content) to that dataset. Evidently, this type of 
modeling does not contribute to the modeling framework 
and cannot lead to a decision being respectively attached 
to the model other than providing an identified for that 
dataset. 

The above discussion suggests that assigning models to 
datasets that are subsets of the existing datasets may not 
only compound the modeling framework, but also con- 
tribute to making it to increase in a combinatorial man- 
ner. These cases can be more effectively addressed simply 
by understanding that a new dataset is actually a subset 
of a larger dataset explained by a more comprehensive 
model. 

Although the basic principle in the M* is to try to 
assign a model to every new dataset of interest, cases in 
which the latter is already fully contained within the exist- 
ing datasets associated to models would only be justified 
in case the new datasets has some special relevance re- 
quiring its discrimination, through a restriction, from the 
existing datasets. Otherwise, this type of new dataset can 
simply be ignored. 
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The fact that a same data element can map into two or 
more models motivates the use of measurements to quan- 
tify these interrelationships. A possibility consists simply 
in considering the number n; ; of associations between 
each data element z; j in a subset w and all the existing 
models. Then, relative frequency histograms (or respec- 
tive moments) can be used to characterize each dataset 





(or model). Datasets leading to relatively large average 
number of multiple connections (n; j} can be understood 
as being less specific, in the sense that the respective data 
elements are strongly related, though not through bijec- 
tive relationships, with several models. It is also possible 
to count the number of individual data element connec- 
tions received by a given model, as this value provides 
insights about the generality of the models. 

It is equally interesting to study the distribution of 
these histograms among the several models in the envi- 
ronment E. In case the datasets are found to have similar 
distributions of the r n; į statistics, the modeling frame- 
work can be understood as being more uniform. 

Observe that the relationship between the datasets and 





respective models can also be represented in terms of a bt- 
partite network having weights corresponding to the num- 
ber of respective data elements mapping from a dataset 
to a model. Such networks can provide valuable informa- 
tion about the overall structure, completeness, malleabil- 
ity, and robustness of the respective modeling framework. 
It would be of particular interest to devise means for en- 
hancing a given scientific framework while taking into ac- 
count the topological features of these networks. ‘These 
networks should also present hierarchical structure reflect- 
ing not only the data/model hierarchy, but also how the 
data elements are distributed amongst the existing mod- 
els. 

An analogue approach can be adopted regarding the 
types of features in a given EF. 

While the M* approach requires the bijective pairing 
of datasets and models, it is also possible to consider the 
following situation (and related variations). Given a cur- 
rent dataset environment / and a modeling framework 
M, it may happen that one of the datasets w; already 
paired with a respective model m,, defining the pairing 
(wi, Mmi), becomes associated to another distinct model 
mj. This corresponds to the mapping situation depicted 
in Figure 2(d). 
cause, though the mapping from w; to m; is not typically 


This is a case of particular interest be- 


considered a function, the inverse is a function. These 
situations can be easily accommodated into the M* ap- 
proach simply by merging the two models m; and mj 
through their union or intersection, since (w;Uw;,miVm;) 
and (wi N wi, Mi A mj). 

Reaching the most complete knowledge about Q can 
be understood as the ultimate goal of modeling. This 


situation corresponds to having models associated to ev- 
ery possible subset w C Q. Quite interestingly, this can 
be done in several manners, including the following ex- 
treme approach: the model corresponding to each pos- 
sible subset of Q consists simply to enumerating its ele- 
ments. The problem with this trivial solution is that not 
much is learned about the data elements and their group- 
ing into datasets. Other approaches include the already 
discussed combination of existing models, as well as devel- 
oping completely new models based on insights provided 
by the similarity between datasets by using the index A 
suggested in Section 4. 


6 A Paired Algebra of Datasets 
and Models 


Though the M* meta model has so far been contemplated 
in a mostly static manner, additional mechanisms may be 
incorporated allowing the progressive derivation of new 
models and datasets. A possible respective approach is 
described in the current section involving either combi- 
nation of datasets in terms of set operations or the inte- 
gration of models by using logical connectives, which we 
henceforth understood as a paired algebra of datasets and 





models. Yet another possibility to be discussed elsewhere 
is the presentation of new pairs of datasets and models. 

It should be also taken into account that the proposed 
framework can be readily adapted to other types of mod- 
els, e.g. by using production rules or composition of func- 
tions as in neuronal networks. Interestingly, the heuristics 
usually employed by humans for taking decisions seems to 
be largely dependent of logical manipulations. This prop- 
erty of the M* framework is related to the fact that the 
consistency between models in always guaranteed in terms 
of the respective data consistency. 

Incidentally, the M* framework and its derivations 
seems to be largely congruent with the way humans de- 
velop models, take decision and perform pattern recogni- 
tion. 

The consistent combination of either datasets or mod- 
els is immediately allowed by the fact that the pairing be- 
tween datasets and models corresponds to an bijective as- 
sociation, which establishes a sound bridge between these 
two important domains. Under these circumstances, it 
immediately follows that set operations between datasets 
w become intrinsically linked to logical manipulations of 
respective models m. Some examples of the bijective as- 
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sociations between dataset and m 


Dataset Domain E <=> Model Domain M 


Wk = Wi & Mk = Mi 
a= CAR <> Mk = 7M; 
Wk = Wi U wj Se Mk = Mi VM; 
Wk = Wi N Wi SS Mk = Mi A Mj 
Wh = Wy — Ww; = 4, e me =m; AWM; 
wk = wj — wi = [wil] Nw; —> my = 7M; Am; 


wk = [wi U wf = [wi] t A lw] E > me, = AM, Am; 


wk = [wi N wl = [wi]? U lw; > Mmg = ami V am; 


Wr =w U [w Nw] = mp =m, V [M; A mp] 


Where []© stands for the set complementation operation 
respectively to Sg. Observe also that to each set oper- 
ation does correspond a logical manipulation of models, 
and vice-versa. 

Table 1 presents the 16 possible logical operations be- 
tween two logical variables m; and m; yielding mę, but 
out of them 4 are not really useful for obtaining new 
combinations of models (and datasets): mą = TRUE, 
Mk = FALSE, Mmk = m; and Mk = mj. 

It is also interesting to observe that it is possible to 





implement every logical operation in terms of 7(x V y) 
or —=(a Ay), among other possibilities, which in the data 
domain becomes [x U y|] and [x N y|], respectively. 

It can be shown that there exists a total of 2” pos- 
As 


a consequence, the number of possible cases steeply in- 


sible logical operations between n logical variables. 


creases with the hierarchy of the sets or models. However, 
the currently available extensive computational resources 
can be applied, possibly incorporating optimization tech- 
niques. At the same time, the incorporation of many hi- 
erarchical levels in the description of a newly obtained 
model also implies that description to become less tan- 
gible by human perception, and therefore more complex 
and abstract. For these reasons, the efficiency of the mod- 
eling framework greatly depends on the choice of models 
for the initial framework, in the sense that some of these 
choices may contribute to explaining a new dataset in 
terms of a relatively (or, ideally, minimal) combination of 
the previous models. 

The subject of obtaining new models through set oper- 
ations between the existing datasets (or logical between 
models) is as extensive as it is interesting and cannot not 





be fully addressed here. However, an interesting approach 
consists of employing intersection, union, complementa- 





tion and difference between a small number of datasets. 
When translated to the human perspective, this intrin- 
sic combinatorial complexity of possible models becomes 





Table 1: The 16 possible logical operations between two logical (or Boolean) variables X and Y. Remarkably, some of them — such as 


“not” , 


“and” , and “or” — are closer to human cognition in the sense of being more frequently employed. Provided the conditions for the 








bijective association between datasets and models is fulfilled, there will be a set operation respective to each of the 16 logical operations. 


closely related to the concept of complexity, because it 
becomes more and more expensive |7] to develop and un- 
derstand highly hierarchical models. This provides a mo- 
tivation for having experts in specific areas, who can sub- 
stantially contribute to integrating other models through 
collaborative exchanges. 

It is also interesting to observe that, instead of under- 
standing that a model needs to be satisfied by every data 
element of the respective dataset, it would also be pos- 
sible also to allow the validity of the model not to be 
restricted to 0 (false) or 1 (true), but to depend on a 
eraded merit figure such as the number of elements in the 
dataset that satisfy the model. The immediate implica- 
tion of this is the loss of the bijective association between 
datasets and models. Now, instead of being underlain by 
a formal logic consistency, the modeling approach starts 
being understood as an optimization problem. 





By blurring the frontier between data and models, the 
M* approach paves the way to several interesting pos- 
sibilities, including the definition of a paired algebra of 
dataset and models queries and manipulations. By ‘alge- 
bra’ we mean the ability to represent datasets or models 
as variables, or symbols that can be solved or interrelated 
through set operations (in the case of datasets) or logi- 
cal equations (models). The association of a new dataset 
to a model corresponding to a combination of the pre- 
viously available models provides not only a way to ac- 
count for the respective dataset, but its intrinsic logical 
construction can also provide insights for computationally 
to decide if specific data elements satisfies that respective 


model. In addition, the obtained models may also pro- 





vide indications about how the datasets were generated, 
sampled and obtained. 

It is important to stress that the combinations of the 
existing datasets while searching for a match needs to 





be preceded by including in every existing dataset asso- 
ciated to a respective model all the elements in the new 
dataset that satisfies that respective model, and revising 
the overall consistency of the new state of the modeling 
framework. It is also possible to proceed gradually from 
new single data elements to the whole new dataset by 
progressively considering subsets of the latter having in- 
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creasing sizes. For simplicity’s sake, the present work will 
be restricted to updating the new individual elements and 
then the considering the whole new dataset. 

Another possibility accounted for by the proposed 
framework consists in performing logical combinations 
between the model statements and then seeking which 
among the existing datasets can satisfy a new modeling 
statement. ‘This same mechanism immediately provides 
the means for building programs for obtaining specific 
properties (determined by the model) to characterize and 
analyze respective datasets. 

The above two possible approaches aiming at combin- 
ing datasets or combining models are henceforth under- 
stood as being data-driven and model driven. These two 
ways of integrating information and knowledge seem to 
correspond to the main manners in which humans per- 
A third possibility 
consists of evaluating the pairs of datasets and models 


form these two important activities. 


from the perspective of the current modeling framework. 

As an example, consider that the original dataset w4 
in Figure 7 becomes important enough to motivate the 
development of a respective model. We can approach the 
solution to this essential problem by searching for a com- 
bination between the datasets already explained by mod- 
els that is identical to w4. This can be done by checking 
between the result of set operations between the already 
instantiated datasets, such as: 


(1) 


Sought datasets are henceforth represented with a pre- 


20 1 ?w; = wa 





ceding question mark, i.e. ?w, while the = symbol stands 
for being equivalent or identical. 

In case it is verified that the datasets wa and w7 satisfy 
Equation 1, by using set intersection, we can immediately 
derive the model of w4 as necessarily corresponding to: 


(2) 


The above described development of a model for the 


m4 = M2 A M7 


dataset w4 in Figure 7 is illustrated in Figure 8. 





The developed framework also allows the features of 
m4 to be immediately inherited from the features origi- 
nally associated to mə and m7. As feature fı and f3 were 





Figure 8: A new model is born. Having found that the dataset w4 
corresponds identically to the intersection between wg and w7, and 
since both these two datasets are associated to respective models, it 
becomes possible to derive a model that satisfies the data elements 
in w4. This can be done by defining the new model m4 as having as 
input the features inherited from both wə and w7, which correspond 
to fı and f3. The model m4 is immediately provided by the logical 
and operation between the models mz and m7. 


required by both the original models, they also become 
pre-requisites of the new model m4. Observe that the dif- 
ferent logical components of the latter model may receive 
different sets of features as input, as is the case for the 
present example. At the same time, it should be kept in 
mind that the bijective association between datasets and 
models depends critically on the choice of features as well 
as the current dataset and modeling framework. 

Another example of the possibilities allowed by the sug- 
gested approach concerns algebraic equations as: 


(3) 


Mi U?m; = Mk 


In other words, we would search in the current model 
framework M for a pair of models satisfying the above 
condition. 

It is also possible to derive hybrid algebraic equations 
such as: 


(4) 


tw, U wp = mg Pm» 
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This type of equation can be solved either by translat- 
ing the dataset-based side (lefthand) into the respective 
model logical equation, or vice versa, and then applying 
the above mentioned procedures. In the former case, we 
would have: 


C 
Zwi U [lw] Smeg A amp ling A aMi = Mpk A [Amy 


Alternatively, we could make: 


C 


Tay U wll =m; A 2m =>; A wil? = we N w 
j p j p 


Interestingly, as models (as well as the respectively as- 
sociated datasets) are progressively combined, a respec- 
tive hierarchy is defined, as illustrated in Figure 9, which 
corresponds to the the last of the models shown in the list 
above. 


h = 0 (root) 








Figure 9: As models are developed and integrated by the proposed 
methodology, a respective hierarchy is progressively established as 
illustrated in this hypothetical example. The hierarchical levels are 





indicated as h, with h = 0 corresponding to the root of the tree 
representing the hierarchy. 


As a consequence of the bijective association between 
datasets and models established by the M* model, we 
immediately have that the above model hierarchy to be 
respectively reflected into the data hierarchy illustrated 
in Figure 10. 


h = 0 (root) 






Figure 10: The dataset hierarchy respectively associated to the 
model hierarchy in Fig. 9 as a consequence of the bijective associa- 
tion between datasets and models established by the M* approach. 


Provided the number of datasets associated to respec- 
tive models is not too large and that we are not aiming 


at several hierarchical compositions of models, it is pos- 
sible to search for the solution of problems by systemati- 
cally checking every possible combination of the set /logic 
operations while progressively increasing the number of 
datasets or models. 

It is interesting to observe that the level of abstraction 
in the modeling framework increases as we move from the 
leaves to the root of the respective tree. That is so be- 
cause the understanding of composite models demands 
the understanding of the preceding models. At the same 
time, the level of generalization may increase, in case 
the combinations involve the union of sets, or decrease, 
As such, 
the proposed framework accounts for these two important 


as implied by intersections between the sets. 


paradigms, while also indicating that the abstraction in- 
creases in both these cases, as it is ultimately related to 
the complexity of the respective logical model. 

The proposed M* approach also relates to the critically 
important concept of causality. It is posited here that, at 
least as typically understood by humans, causality cor- 
responds precisely to the bijective association established 
between a dataset of relevance and its respective model, 
and more specifically in the fact that every element of a 
data set satisfies, implies (or causes) the model. In other 
words, it is only the full presence of all the conditions 
of a model that can enable the model verification. Inci- 
dentally, observe that this possible definition of causality, 
which is after all a human concept just like complexity, 
implies the time sequentiality that is often used to char- 
acterize causality. After all, having all the data elements 





in the dataset representing the event of interest triggers 
the model (decision) conditions to be satisfied. The situa- 
tions corresponding to incorrect identifications of causal- 
ity would correspond to models only partially associated 
to an incorrect, though related model, which will oth- 





erwise lead to correlations between the observed events. 
Recall that in the proposed framework a same model may 
satisfy (imply or cause) two or more models. 

In cases where the number of data elements satisfying 
each model is taken as a graded indication of model ad- 
herence, the existence of crossed connections, i.e. a data 
element satisfying more than one model, implies in re- 
spective correlations between the activations of models 
that are related to the datasets in a non-necessarily causal 
manner. 


7 Case-Example: Binary Lattices 


As a more concrete example of representing the construc- 
tion of a model framework by using the concepts and tech- 
niques suggested so far, we consider each data element 7; 
to correspond to each of the possible instances of a bi- 
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nary lattice or array with dimension L x L. By ‘binary’ 
it is meant that the lattice elements can only assume the 
values ‘0’ or ‘1’. 





Therefore, we have Q to correspond to every possible 
binary pattern on a lattice, while the datasets will corre- 
spond to subsets of the power set of Q. 

Given an integer value L, a total of Nz = 9(L*) possible 
data elements are respectively defined. Figure 11 presents 
the set of all possible patterns for N = 2. 





Figure 11: All the possible 2(2*) — 16 data elements that can be 
derived from a binary lattice with dimension 2 x 2. Zeros can be 
understood to correspond to the blue points, and ones to the brown 
points. 


Observe that the number of patterns increases in a very 
steep, exponential manner. We shall adopt L = 3 for our 
first case example, therefore implying Nz = 512 possible 
data elements or basic patterns in Q. 

For simplicity’s sake, we will assume that E contains all 
possible data elements of Q, but this is not a necessary 
requisite for the M* approach. 

Let’s assume that several datasets have already been 
singled out for their potential relevance and associated to 
respective models, as presented in Table 2. 

This table includes the textual meaning of each model, 
the involved features and data structures, the formal con- 
dition representing the model, as well as the size of the 
Recall that 
each of these models has a respectively associated dataset. 
Also, it should be realized that the choice of this initial 
modeling framework is, in principle, completely arbitrary, 


dataset respectively satisfying each model. 


though the effective combination of models will also de- 


| mi | textual description of the model features/structures | decision || size | 


[ra [contains only an isolated point | number of pomtsn | n= |> 
[mo | contains 2 pomis | number of pomtsn | n=? | 
[ms | contains 3 points | number of pomtsn | n=3 |a 
[ma [contains 4 points | number of pomtsn | n= |% 


fms | contains at least 3 pomts | number of points n 





n53 [406 


[1,1] € {Kx} | 256 | 
[1,1], (V, N] € {rk} | 88 | 
= ] or 2 for every point | 291 
> 1 for at least one point | 224 | 


wW 
W 


Table 2: A possible model framework M for the binary lattice case-example. 


pend critically on these initial choices. 

For simplicity’s sake, each of the binary lattice ele- 
ments, henceforth called a point, is here understood to 
be a square with four margins, two vertical and two hor- 
izontal. The points with value ‘0’ are called background 
points while those equal to ‘1’ are said to be foreground 
points. A point a in the binary lattice is said to be adja- 
cent to another point b provided they share a vertical or 
horizontal margin. A connected component is a set of fore- 
ground points so that it is possible to move between any 
pair of their constituent points through adjacent margins. 
The local width of a set of foreground points is henceforth 
understood as the number of adjacent foreground points, 
i.e. neighbors. A binary lattice element is said to be thin 
whenever each of its foreground points has width 1 or 2 
(e.g. [8]). 

The models adopted in this example framework range 
from being very simple (e.g. mı) to moderately complex 
(e.g. Mge, M7, Mg), though this is a largely subjective clas- 
sification. It is interesting to observe that us, humans, 
tend to have more difficulty in handling the complement 
of a model than its direct definition, as is the case < 9 >. 

Let’s now proceed to the dataset wio shown in Fig- 
ure 12. 

What is the model fully explaining this dataset? Can 
it be derived from the modeling framework in Table 2? 
This problem can be tackled by considering the several 
types of combinations of the existing datasets satisfying 
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Figure 12: A puzzle. A dataset w10 containing possible data ele- 














ments derived from the 3 x 3 binary lattice needs to be associated to 
a model. Can it be derived from the model framework in Table 2? 
What is the property shared by these patterns? 


algebraic constructions such as: 


Tar U ?w; = wio 

204 1 o = wio 

2a — lwz = Wio 

Tar U? a = W10 
Pw] E U lwj = wio 

Mag U ?m,|° = W10 
Kee Tm] = W10 

(2m; U?m,;) U ?Mk = w10 


As it happens, it can be verified that the dataset of 
interest can be obtained through the combination m7 N 
mg is identical to wy 9, therefore implying the following 
respective model: 


wio = w7 N wg 4 mio = M7 Amg 


It follows that, as the number of existing models (and 
datasets) increases, the higher the probability of finding 
a combination of those models that can explain a new 
dataset drawn from Q. Recall that the compositions be- 
tween the available models give rise to a respective hierar- 
chical organization. Also, a new dataset can be explained 
by a completely new model not directly related to the ex- 
isting ones, though perhaps sharing some of the adopted 
features. 

Another interesting situation arises when a new model 
m is given regarding whether it will satisfy any of the ex- 
isting dataset. This problem can be approached by trying 
to identify a logical combinations between the existing 
models that yield the new model, or by checking every 
existing dataset against the new model. 

For instance, let the new model m,1 be textu- 
ally defined as “the dataset contains the shortest con- 
nected component comprising both the lattice elements 
[1,1] and |N, N|.” A possible first step is to try to trans- 
late this condition in terms of the available measurements 
and models. First, we select model < 7 >, because it se- 
lects all data elements that contain at least a connected 
component containing [1,1] and |N, N|. Then, we take 
into account that the shortest possible path necessarily 
contains 3 points, which can be verified from model < 2 > 
while making P = 3. The sough model then can be ob- 
tained as: 

M11 = Mə A m7 


The dataset satisfying this condition can be immedi- 
ately identified as corresponding to the dataset associated 
to both wz and w7, corresponding to the 3-point diagonal 
between the points [1,1] and |N, N]. Observe that the 
logical construction of the obtained combined model al- 
lows us to directly obtain, through logic-computational 
means, the respective means for verifying the adherence 
of specific data elements. 


8 The M<@- Meta Model 


So far, we have understood that all elements in all 
datasets satisfy the respectively associated model. How- 
ever, it is possible that one or more elements currently in 
a given dataset do not satisfy the respective model. Pos- 
sible causes for this include errors in the features determi- 
nation, model inconsistencies implying the mathematical 
implementation not to correspond to the textual char- 
acterization of the model, or errors in storing and han- 
dling the datasets and/or models. These situations will 
be henceforth understood as corresponding to the pres- 
ence of error or noise. 

From the perspective of the present work, the most im- 





portant consequence of errors and sampling is the loss of 
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the bijective association that is critical for the consistency 
of the M* reference model, which leads to modeling, de- 
cision and classification errors. 

In this section we describe an adaptation of the M* 
model, here called M<**, which can be considered for 
dealing with the above characterized situations, as well 
as for applications where only approximate verification 
of the conditions implied by the models are allowed. The 
underlying idea in all these cases is to adopt some effective 
means for quantifying the similarity between any two sets. 

A possibility to cope with errors and sampling would be 
to relax the binary decision on the validity of a model that 
is characteristic of the M* structure. This could be done 
by having by grading the degree of validity of a model. 

In the present work, we will address the sampling and 
error limitations by the adoption of the following index 
that can express the similarity of two discrete datasets w; 
and wj: 


[ay Nw, | 


A(wi, ws) (5) 


lw Uw, | 

where the operator |A| stands for the cardinality (or 
number of elements) of the set A. This index is known as 
Jaccard or Tanimoto index (e.g. [9]). 

It can be verified that, conveniently, the above index is 
intrinsically normalized as 0 < A < 1, so that it does not 
depend on the size of the sets. 

Observe that the above mentioned graded validity of 
models can also be combined with the adoption of the 
A similarity for deciding on the associations between 
datasets. 

In cases where each of the data elements x; ; of each 
dataset w; have been associated to non-negative weights 
a(a;,;) proportional to their relevance in the specific prob- 
lem of interest, it is possible to adapt the similiarty index 
as: 


Alun) = 
\ i) Deg ete) (Yr) 


(6) 


with O < A < 1. 
immediately applied in case the data elements have been 


Observe that this expression can 


associated to probabilities, therefore paving the way to 
the probabilistic index to be described in Section 11. 

As an example, let a set A = {1,2,3,4,5,6,7}, so that 
|A| = 7. Suppose a new set B = {1,2,3,4,5,6,7,8} is to 
be compared with set A. This situation could be implied 
by obtaining a new version of a previous dataset, but 





incorporating by mistake the element ‘8’. 

In this case, we would have AN B = {1,2,3,4,5,6, 7} 
and AU B = {1,2,3,4,5,6, 7,8}, therefore implying |A N 
B| = 7 and |AU B| = 8, from which we obtain: 


ANB 
= | Z 0.875 


A(A, B) = ———_ = 
ae) JAUB| 8 


(7) 


The index A() therefore provides an interesting resource 
for checking if a new dataset could correspond to a noisy 
This can be 
done by adopting a threshold T, and discarding any new 


version of any of the existing datasets. 


dataset w for which A(w,w;) < T for any of the existing 
sets Wi. 

Let us now illustrate the situation where a new version 
B of the existing set A is obtained while overlooking some 
elements, e.g. B = {1,2,3,5,7}. In this case, we would 
get: 


AnB| 5 
= =~ = 0.7142 8 
aug 77! (8) 





A(A, B) 


As the obtained value is relatively high, it would suggest 
that the set B does not correspond to a new dataset and 
therefore can be merged into A or understood as being 
the same model. 

However, in case B = {1,5,10,12,20}, we would ob- 
tain: 


AnB| 2 
= = 2 =0.2 9 
AUB 7 7 7 02887 (9) 





A(A, B) 


which is much smaller than in the previous case, sug- 
gesting that the set B does correspond to a new dataset. 





As it will be presented in Section 11, the above simi- 
larity index can be adapted to datasets associated to re- 
spective probabilistic densities. 





It should be observed that there are several other pos- 
sible indices and methodologies that can be applied to 
deal with error and noise influencing data, features, and 
models. However, the above described alternative pro- 
vides a particularly interesting approach especially given 
its conceptual and computational simplicity. In addition, 
though we currently discussed only possibilities for trying 
to avoid the incorporation of incorrect datasets, there are 
many other implications of incorrect or missing data and 
modeling that deserve to be further addressed at more 
length. 

In brief, the M<** meta model, as described here, can 
be simply understood as the M* model that uses the 
adopted similarity index in order to identify the most 
likely combination of existing models while explaining a 
new dataset as well as to decide whether two datasets 
could be treated as being the same. 

Also important to realize is that the above discussed 
errors and sampling imply in loosing the bijective asso- 





ciation which is required for consistency in the reference 
M™* model, implying in respective modeling errors, classi- 
fications, and decisions. 


9 Case 
Number Theory 


Example: Elementary 


In order to illustrate the potential of the M<‘? approach, 
a model of numeric sets taking into account the property 
of a number being a multiple of some radix is described 
in this section. This example involves new datasets that 
cannot be exactly explained by any of the models in the 
current modeling framework. 

We start by defining Q = {2,3,4,...,20}. The number 
1 is omitted as it is a trivial divisor of any natural number. 

The datasets w;, 2 = 1,2,...,19 will correspond to the 
multiples of i+ 1 up to 20. Thus, w3 = {3,6,9, 12,15, 18}. 
The respective models are immediately derived from the 
respective multiplicity property. For instance, M3 corre- 
sponds to “all the numbers smaller than 20 that are divis- 
ible by 3”. Therefore, the adopted modeling framework 
contains a total number of 19 pairs (wi, mi). 

The first important point to be taken into account that 
these 19 models are by no means sufficient for explain- 
ing most of the possible new datasets that can be drawn 
from Q. However, as we will see, the adoption of the A 
similarity allows a surprisingly good performance while 
being capable of providing interesting insights about pos- 
sible explanations an interrelationships, even if no perfect 
combination can be found. 

A modeling engine was implemented, using list manip- 
ulations in R, considering the following set operations (A 
and B are any of the existing datasets) shown together 
with the respective logical operations: 


e Á 4 ma 

e AV 4 -ma 

e AN B 4 ma Ampg 

e (ANB)? = AUB? => ~(m4 Amp) (De Morgan) 
e A? AN B = B — A 4 -m4 Amg 

e AN BĪ = A — B 4 m4 Amg 

e AU B 4 maYmpg 

e (AUB)? = ACN BY => ~(m4 Vmp) (De Morgan) 
e A“ U B 4 -m4 V mpg 

e AU B] 4> ma Vamp 


(AC N B) U (AN BY) = ma mpg (xor) 


e (ANC)U(AF N BY) = ma O mpg (xnor) 


Let’s now consider the data-driven query relative to the 
new dataset w = {2,4,6,8, 10,12, 14, 3,6,9, 12,15}. 


Only operations between two Boolean variables are con- 
sidered for the sake of simplicity and also for keeping the 
results more accessible to human interpretation. 

The engine found A = 0.769, respective to the model 
M = Mə V M3, which corresponds to the union of the 
multiples of 2 and 3. Though the similiarity index is not 
maximum, the provided explanation is still quite reason- 
able even if the given dataset cannot be fully expressed by 
the obtained combination (there are some values missing 
in w). 

Let’s consider now another example, respective to the 
new dataset w = {8,10,12,14}. We get A = 0.5 for 
two approximate solutions: w12 Uw 4 and (wa N (wio)©) U 
((w4)° Nwo). Observe that, though all the numbers in 
this given dataset are multiples of 2, the set only contains 
4 out of the 10 elements in w2, so the result obtained is 
still fully compatible. Given that A = 0.5 can be consid- 
ered too low, a new model would need to be defined for 
this dataset, as it cannot be approximated by combina- 
tions of those in the existing modeling framework. In this 
case, aS observed in Section 5, it is also possible to asso- 
ciate to a restricted version of wa, i.e. w to w2 — (w2 — w). 
Another possibility is to take into account a further fea- 
ture, such as being comprised within a given minimum 
and maximum values. 

Now, let’s make w = {2,3,5,7,11,13,17,19}. The re- 
sult provided by the engine is A = 0.875, corresponding to 
two equally similar approximated solutions (w2 Uwg)° as 
well as (w2 N wis) U ((w2)° N (w15) ). Interestingly, the 
prime numbers in the range from 2 to 20 could be well 
modeled in terms of the two combinations of sets, with a 
relatively high similarity index. 

As another example, let’s consider w = {3}. In this case 
we get A = 1/3, with 5 respective possible approximated 
combinations. This example illustrates the issue that it is 
difficult to express a small set as a pairwise combination of 
larger sets such as most of those in the existing modeling 
framework. 

Let’s now have the dataset w = {4,8,12,16,20}. By 
querying the described engine, we have two solutions for 
A= 1: w4 and wg N w4, therefore capturing the fact that 
this last example dataset contains all elements between 1 
and 20 that are both multiple of 2 and 4. 

As a last example, we 
{1, 3,5, 7,9,11,13,15,17,19}, which 
the odd numbers between 2 and 20. The exact solution 


have W 


corresponds to 
[w2]] is provided by the engine. 

As above illustrated, even though not ensuring full ac- 
curacy, the application of the M <°? can still provide valu- 
able insights while understanding datasets and trying to 
find models for them. 
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10 Case Example: Verifying a Hi- 
erarchical Structure 


In this section, we describe another illustration regard- 
ing how the M* framework can be employed in order to 
verify a hierarchical structure. This example is intention- 
ally simple, given its predominantly didactic objective. In 
addition, this examples illustrates only one of the many 
methods that can be applied for deriving or verifying a 
hierarchy. 

Though the general problem of verifying or deriving 
hierarchies from data is particularly complex, requiring 





more sophisticated methodologies, the present example 
illustrates how some of the intrinsic aspects of the M<‘? 
framework can be applied to this important problem. 

Let’s consider the hierarchical structure depicted in 
Figure 13. 

The basic data elements and datasets that characterize 
a simple house have been organized into three successive 
hierarchical layers or levels, with the highest one corre- 
sponding to the topmost dataset, which is associated to 
the concept of house. 

We start by taking the root, corresponding to the con- 
cept of house, as a reference. 

First, we obtain the similarity index A values for each 
pair of nodes obtained by considering each of the elements 
from the three distinct hierarchical layers in the original 
tree. This yields non-null values only between pairs in- 
volving elements from the first and second levels, and sec- 
ond and third levels, with null values of similarity for pairs 
composed by elements from layers one and three. This 
result confirms the bipartition between adjacent layers, 
which is characteristic of hierarchical organization. 

Then, we can check the type of relationship between the 
adjacent layers. Two situations will be considered here: 
(i) generalization, characterized by union of the datasets 
from the lower level layer; and (ii) specialization, char- 
acterized by intersection of the datasets from the lower 
to the higher adjacent layers. In order to do so, we ob- 
tain the similarity indices for each dataset in a given layer 
and pairwise union of the datasets in the lower hierarchy 
layer. Then, we repeat this verification but considering 
intersection, instead of union, between the datasets from 
the two adjacent layers. 





In the case of the present example, we obtain higher val- 
ues for union, corroborating that the hierarchical struc- 
ture in this particular example has been obtained from 
the generalization of the datasets from lower to higher 
hierarchical levels. 

Observe that other dataset or logical operations can be 
For 
instance, it is possible to verify, by using set difference, 


considered while analyzing or deriving hierarchies. 


















room: window, 
door, bed, wall, 
floor. 


corridor: , 
door, wall, 


bed: 
floor. 


trees: 
ground. 


levels, identified by light red, green and blue, can be identified. 


in which manner the concepts between a same layer dif- 
fer one another. A complementary type of analysis could 
involve using the xor and znor logical operations. One 
particularly interesting possibility is to study, through 
the systematic application of all logical operations, how 





the presented hierarchy has been possibly defined and ob- 








tained, and to which a degree the identified sequences of 
logical operations are supported by the set operations be- 
tween the respectively associated datasets. 

By using the above identified concepts and methods, it 
is also possible to enhance a given hierarchical construc- 
tion, e.g. by identifying links that are particularly weak 
and trying to complement the respective contents in the 
associated datasets. 

Another interesting problem which tends to be more 
challenging than those already mentioned here concerns, 
given a set of datasets, to derive a respective hierarchy 
along generalizing or specification structures, an appli- 
cation that brings us close to the interesting subject of 
ontologies. 


11 The MW Stochastic Meta 


Model 





There are several abstract and real-world situations in 
which the datasets are characterized by respective fea- 
tures that may extend continuously along the respective 


house: room,corridor, 
garden, bathroom, 
living room. 


garden: door, 
trees, flowers. 


flowers: vase: 
ground. floor. 


Figure 13: The hierarchical structure considered in the discussed case example, involving the description of a simple house. Three hierarchical 
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living room: 

window, door, 
table, sofa , wall, 
floor. 


bathroom: 
window, door, 
vase, sink, wall, 


sofa: table: 
floor. floor. 


axes in the feature space. Or, more importantly, there 
are cases in which Q contains a huge number of elements, 
which tends to be the case for several real-world situa- 
tions (e.g. the set of all possible butterflies). Mathemat- 
ically, the feature-based representation of these sets can 








be properly obtained in terms of respective multivariate 
probability densities representing data elements distribu- 
tion in the adopted feature space. At the same time as 
this probabilistic approach enables the consideration of 
many interesting problems, it also implies that the full 
consistency between data and model that is characteris- 
tic of the M* to be undermined. This is to a great extent 
a consequence of the fact that the probability densities 
associated to specific models/categories often overlap one 
another. Each probability density is intrinsically associ- 
ated to a respective random variable, or measurement. 

This type of representation can be shown to provide vir- 
tually every statistical information that may be required 
regarding the dataset as described by the adopted fea- 
tures. For instance, the probability of observing all the 
data elements contained in a given subset of the feature 
space can be estimated in terms of the hyper-volume of 
the density taken on that region. The reader should not 
be put off by the seeming sophisticated adopted mathe- 
matical concepts, as the overall idea and principles are 
likely to be grasped with the help of the case-example 
provided in Section 12. 





The representation of feature-based discrete, sampled 


datasets also leads to the possibility of applying Bayesian 
decision (e.g. [8, 3, 5]) in order to decide what is the most 
likely category given a specific data element. This same 
approach also provides subsidies for estimating the prob- 
ability of making incorrect decision. Even more impor- 
tantly, the above outlined Bayesian decision method can 
be show to provide optimal results in the sense of min- 
imizing the chances of making decision errors, However, 
this important property requires the availability of exact 
probability densities, but good results should be obtained 
for representative samples of data. 

Because it is impossible to obtain an infinite number of 
samples allowing the complete characterization of these 
continuous variables, we need to resource to some suitable 
methodology capable of yielding satisfactory estimations 
in terms of estimated probability densities. The integra- 
tion of this approach into the suggested M* meta model 
leads to the M<°7 variant, capable of addressing situ- 
ations characterized by incomplete data sampling. The 
remainder of this section presents a description of this 
approach. 

Let w;, i = 1,2, Nu, be datasets sampled from a uni- 
verse Q. Each of these datasets w; is chosen to be charac- 
terized in terms of a set of random variables (or features) 
fj, j =1,2, Ny. In order to obtain a suitable probability 
density representing each of these datasets, it is possible 
to perform a kernel expansion (e.g.[5, 3, 10, 11] on that 





set, with each individual data element being represented 
as a Dirac’s delta function ô( f), f = fi f2... fn,|. 
The gaussian kernel represents an interesting choice 





given its mathematical properties, but other kernels can 
be more suitable depending of the type of datasets and 
features of their original probability densities. The expan- 
sion itself can be performed by convolving (e.g. [10, 12]) 
the given dataset with a normalized version of the ker- 
nel. The estimated probability density obtained by ker- 
nel expansion of each dataset w; is henceforth expressed 
as pi( f). 

Now, in order that each dataset be associated to a sup- 
port region having finite hyper-area, we perform a thresh- 
olding operation on the respective estimated probability 
density. The resulting support hyper-region is henceforth 
indicated as p;, which thus is necessarily a subset of the 
respective feature space. 

The value of the threshold can be determined as cor- 
responding to the situation in which the respectively ob- 
tained support region contains a fixed ratio x of the possi- 
ble elements (this is related to the concept of percentile). 
In practice, one is likely to choose large values of y in 
order to retain a representative set, but other approaches 
may also be of interest. 

Observe that the option of defining the support regions 
in terms of percentiles implies a more sparse dataset to 
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cover a larger area than a more compact counterpart. 
These situations may be addressed by taking into account 
also the density values associated to the support regions, 
reflecting the fact that the less dense portions of the sup- 
port region will have smaller weight. 





It is also necessary to derive the overall probability den- 
sity as defined by all elements in all the existing datasets. 
This can be done by first obtaining the union of all avail- 
able datasets, i.e. TP = w, UwoU...Uwy,, and then per- 
forming a kernel expansion possibly considering the same 
x as adopted for estimating the other support regions. 
The estimated overall probability density is henceforth 





represented as p( fan 

The derivation of the support regions for each involved 
dataset has the immediate benefit of having finite area, 
immediately allowing them to be combined respectively to 
the set operations involved in the described M* modeling 
approach. 

We are now in a position to generalize the similarity in- 
dex described in Section 8 in order to quantify the similar- 





ity between two sets A and B when represented in terms 
of probability densities, thus enabling the identification of 
the more likely model combination possibly explaining a 
new dataset. 

Let A and B be two datasets described in terms of their 
respective probability densities as well as the associated 
support regions p4 and ppg as obtained for a chosen x. 
The similarity between those two datasets can then be 
estimated as: 


= 


Laos DS) af 
tates vf) af 


We again have that 0 < A() < 1. Observe that it is 
also possible to assign normalized weights to each of the 


A(A, B) = (10) 


intersection regions between the support densities in a 
new dataset corresponding to integration of the probabil- 
ity density within that same region. These weights can 
then be incorporated into the above equation, so that the 
portions of the support region of the new dataset that 
explains a smaller fraction of the overall data population 
have smaller influence on the decision. 

It should be observed that the above described ap- 
proach is still empirical, so that further formal valida- 





tions should be developed. It is also possible to consider 
alternative methods for comparing between multidimen- 
sional distributions, such as those involving adaptations of 
the non-parametric Kolmogorov-Smirnov test (e.g. [13]) 
or the Jaccard index adapted to compare distributions 
and scalar fields [14]. In particular, the densities can be 
converted to multisets and then compared by using the 
Jaccard index [15]. 

The above obtained index provides a simple interesting 
manner for comparing a new dataset wą with combina- 


tions of the existing datasets obtained through respective 
set operations, in an analogous manner as done in the M* 
approach, but now also incorporating the estimated prob- 
ability densities respectively associated to each dataset. 
The association of the density distributions to lim- 
ited area support regions also paves the way for applying 
mathematical morphology (e.g. [16]) operations on these 
sets, allowing the derivation of dilated and eroded ver- 
sions, to name just a few possibilities, of the support and 
intersect regions, which can provide additional informa- 
tion about the shape and interrelationship of the datasets 
in the original feature space. For instance, it would be 
possible to dilate the regions of all existing groups and 





use the set difference between the result and the origi- 
nal support regions in order to obtain the border of the 
datasets. 

Interestingly, the stochastic approach described in the 
present section paves the way to other important capa- 
bilities, including the possibility to obtain not only a 
likely combination of models explaining a new dataset, 
but also the quantification of how many elements of the 
latter relates to the each of the involved existing datasets. 
This possibility, as well as the overall stochastic model 
M<°~ described in this section, are further discussed and 
illustrated respectively to a specific real-world dataset. 
In a sense, this type of more complete description of a 
given dataset extends the concept of a dychotomic pat- 
tern recognition decision to a relatively more complete 
model providing additional information about how the 
new dataset explains and relates to the other existing 
datasets (and models). 


12 Case-Example: The Iris 


Dataset 


We now address a typical case of supervised pattern recog- 
nition by using the Iris dataset, which consists of 50 indi- 
vidual iris flowers from 3 species, each being characterized 
by Ny = 4 features. Thus, we have N} = 150 individuals. 
We shall be restricted to features 2 and 3 in order to allow 
the feature space to be more easily visualized. 

Figure 14(a) depicts the distribution of all the 150 in- 
dividuals in the respective two-dimensional feature space, 
with the three categories being identified by respective 
colors. 

The first step in our modeling approach consists of ob- 
taining kernel expansions of the three groups of points 





which, in the case of the present example, is achieved by 
using a circularly symmetric gaussian as kernel assum- 





ing x = 0.97. The kernel expansion is then performed 
by convolving the original data elements in each group, 





which are represented as Dirac’s deltas, with the normal- 
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Figure 14: The three species in the iris dataset shown in a features 
space derived from the original features f2 and f3: species 1 (green), 
species 2 (red), and species 3 (blue). 


ized gaussian kernel. The adoption of a fixed percentile 
is reasonable given that the three datasets present a rela- 
tively similar sparsity. Observe also that, by varying the 
parameter x multi-scale models of the datasets can be 
derived. 

Figure 15(a) illustrates the result of the gaussian kernel 
expansion of each of the three datasets in Figure 14(b). 


(a) (c) (e) 





Figure 15: The densities and support regions obtained by non- 
parametric gaussian kernel expansion of the three types of flowers 


in the iris dataset: pi(f) (a) and pil) (b); p2(f) (a) and p2( F); 


> 


and p3(f) (a) and p3(f). These results were obtained for x = 0.97. 


Figure 15(b) presents the three decision regions ob- 
tained considering x = 0.97. 

The initial modeling framework is assumed to contain 
the models indicated in Table 3. 

The density p( f) corresponding to the union of the den- 
sities associataed with the three iris species is shown in 


Figure 16. 


} textual description of the model | features/structures decision =) size | 
} ma | flowers belonging to species 1 high similarity index for type 1 } 50. | 


| me | flowers belonging to species 2 high similarity index for type 2 } 50 | 





| ma | flowers belonging to species 3 high similarity index for type 3 } 50 


Table 3: The initial modeling framework M for the Iris case-example. 


wi U wo U we 


Overall similarity with w4: A = 0.375 

23 individuals related to 70.41 % of dataset 1 
19 individuals related to 41.88 % of dataset 2 
22 individuals related to 50.82 % of dataset 3 





Figure 16: The density probability function corresponding to the 
union of the three iris species. 


Let’s now assume that a new dataset w4 becomes avail- 
able, which is shown in Figure 17. 





Figure 17: The density (a) and support region (b) of a new dataset. 


This set is then also kernel expanded by the same gaus- 
sian as before, also using y = 0.97. The result is shown 
in Figure 17(b). 
with respect to each of the three models, and updated 





Each new data element is then verified 


respectively. 

It is now possible to perform a search for possible com- 
binations of the existing models that best explain the new 
dataset w4. Among the several tried combinations, up to 
the second hierarchical level of composition of logical con- 
ditions, the following dataset was singled out as being the 
more likely to correspond to the new dataset w4: 
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Figure 18 depicts the three density probability func- 





tions associated to each of the three species after being 
clipped by the support region of the new dataset. The 
integration of these clipped functions provides the identi- 
fication of the relationship between the new dataset and 
the three existing models. 





Figure 18: The three probability density functions of the iris species 
clipped by the support region of the new dataset. 


The number of individuals related to each of the exist- 
ing datasets correspond to the number of data elements 
contained in the intersection between the support regions 
of the new dataset w4 and each of the other three existing 
datasets, which may to some extent overlap one another. 

In the light of these results, the new dataset cannot be 
considered to be explainable by the union of the three 
original datasets corresponding to each of the three iris 
flower species. In addition, this new dataset seems to be 
more closely related to the iris type 1, though relatively 
similar relationships are observed also with the other two 
categories. 

As such, an alternative explanatory model in the do- 
main of plant science would need to be found or devel- 
oped for this new dataset. In the case of this particular 
example, as the new dataset contains elements similarly 
related to each of the three original iris species, it could be 
conjectured that the new samples correspond to physical 
alterations, such as a disease or changing environmental 


or genetic conditions, taking place on the flowers and im- 
plying the feature fə to shift in a similar manner for all 





the three species. 

Several other insights can be derived from the obtained 
descriptions as in the previous example. For instance, in 
case a new dataset is found not to relate directly to any of 
the existing models while presenting a good relationship 
with the union of the respective complements, it may be 
associated to the borders between the clusters in a fea- 
ture space. Such interstitial regions can provide valuable 
information for identifying effective separation regions in 
those spaces. 

As an example, consider a new dataset whose respective 
density and support region is presented in Figure 19. 


(a) 


(b) 


w 


Figure 19: The density (a) and support region (b) of another new 





dataset. 


The application of the described approach yields: 


wi U wo U we 


e Overall similarity with w4: A = 0.028 


— 0 individuals related to 0 % of dataset 1 


— 1 individuals related to 3.15 % of dataset 2 
— 0 individuals related to 0 % of dataset 3 





Therefore, the second new data can be understood not 
to correspond to the model wy U wə U wg. Unlike the pre- 
vious example, however, the minute number of elements 
related to any of the existing datasets suggest that the 
second new dataset belongs to the borders or interstices 
between the existing data. 

In case the new dataset corresponds to a single data ele- 
ment, the similarity of the M<°7 approach with Bayesian 
decision theory becomes more recognizable. Figure 20 
illustrates the decision regions that are defined for the 
above data. 

For each possible point (fo, f3), the respective values 
are checked for every probability density associated to the 
existing datasets, and the label of the existing dataset 
yielding the largest probability density value is associ- 
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Figure 20: The decision regions obtained by considering Bayesian 
decision theory with respect to the Iris example. Observe the clip- 
ping of the overlap regions in shown in Fig. 15. 


ated to that point. This example considers equiprobable 
datasets, otherwise the mass density of the classes also 
would need to be taken into account (e.g. [8, 5]). Then, 
each new data element can be classified as belonging to 
the dataset associated to the label indicated by its respec- 
tive features in the decision region. 

Thus, for relatively narrow gaussian kernel expansion, 
the incorporation of new data elements that precedes the 
checking for model combinations corresponds very nearly 
to the classical Bayesian decision theory. Though that ap- 
proach naturally integrates resources that can provide in- 
formation about not only data elements, but also datasets, 
as well as supplying information about the adherence of 
each element with respect to the several existing dataset 
other than the most likely one as well as the decision 
errors, it is felt that these possibilities are not often real- 
ized, perhaps as a consequence of the focus on dichotomic 
decision that is inherently motivated by the decision pro- 
cedure. 

Interestingly, it can be shown (see Figure 21) that the 
M<°7* meta model converges to the M* reference model 
as the kernels become infinitesimal. 

It should be realized that while the analysis of datasets 
in 2D can be performed visually by humans, the identifi- 
cation of most of the set combinations is typically difficult 
to be inferred in this manner, especially those involving 
combinations of set complements. The visualization of in- 
ference of set combinations in higher dimensional feature 
spaces is even more challenging to be performed by human 
operators, therefore providing even greater motivation for 





using automated methods such as the above developed. 
Let’s conclude this section by using the M <°? approach 
to study the interrelationship between the three origi- 


nal iris dataset. The results obtained for each of these 





datasets are presented in the following: 





Figure 21: The density obtained for the new dataset considering 
a much narrower gaussian kernel. Observe that the density tends 
to converge to the original points (Dirac’s delta) as the width of 





the gaussian is progressively reduced, also implying the M< o > 
approach to converge to the M* when the for infinitesimal gaussian 
width. The non-infinitesimal width adopted in the stochastic case is 
necessary in order to allow non-zero probabilities in the probability 
densities describing each dataset. 


W1 


Overall similarity with wy: A = 0.2735 

35 individuals related to 96.4914 % of dataset 1 
0 individuals related to 0 % of dataset 2 

0 individuals related to 0 % of dataset 3 


W2 


Overall similarity with wə: A = 0.4203 

0 individuals related to 0 % of dataset 1 

45 individuals related to 96.597 % of dataset 2 
16 individuals related to 41.393 % of dataset 3 


W3 


Overall similarity with w3: A = 0.4709 

0 individuals related to 0 % of dataset 1 

20 individuals related to 44.954 % of dataset 2 
42 individuals related to 96.477 % of dataset 3 


As expected, when taken separately, each of the regions 
resulted in relatively low values of A. At the same time, 








the well separated cluster defined by region 1 has been 
corroborated by the fact that no relationship has been 
found between this region and the others. This is not the 
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case with regions 2 and 3, which presented substantial 
mutual overlap. 


13 And Now, 
Layer 


to the Features 


We have so far focused on the bijective association be- 
tween datasets and models as a distinguishing feature of 
the M* model, understanding that the mapping between 
data elements and features was such as to ensure bijec- 





tive representation of the datasets by respective sets of 
features. In this section we present which are the con- 
ditions that the mapping from data elements to features 
needs to present in order to allow bijective pairing be- 
tween datasets and models. 

Henceforth, each differently instantiated feature will be 





represented as fj, independent of the feature types. By 
having a feature to be instantiated by a data element 
means that a given feature type was measured with re- 
spect to that data element, and the resulting value repre- 
sented as f;. Therefore, a same feature type may give rise 
to two or more instantiated features fj and fk provided 
the instantiations are mutually distinct. 

Figure 22(a) illustrates several types of possible map- 





pings from seven data elements x; E€ X into seven several 
respective instantiated features fj. 

Given that we want the data elements to become bijec- 
tively associated with the feature instantiations, we revert 





the sense of every arrow, yielding the situation shown in 
Figure 22(b). The so merged groups of data elements and 





data features are represented by respective subsets shown 
as ellipses. 

The case involving xg and xy deserves some particular 
attention. Here, we have that rg had been mapped into 
two features instantiations, namely fg and fy. If that was 
the only mapping received by these two features instan- 
tiations, we would have a situation completely analogous 
to that concerning xı and x2, implying both these data 
elements to be subsumed (merged) into a respective sub- 
set w,;. However, we have that also x7 mapped to fy, 
implying that both xg and x7 share this same feature. 

As indicated in Figure 22(b), one solution for obtain- 
ing bijective pairing between these data elements and 
the respective features is to merge xg and x7 into the 
set {xe, £7}, and the two features instantiations into the 
set { fe, fzr}. Another possibility that may be considered 
depending on the specific interest and demands of each 
modeling approach in which zę and x7 need to be kept 
separated, another possible solution for achieving bijec- 
tive mapping would be to remove the mapping from ze 
to xy, especially in cases where this mapping is deemed 
not to be particularly relevant. This would lead to x6 be- 





(a) 


Figure 22: Mapping from data elements from the set X into the 


(b) 


feature instantiations f; in the domain F, illustrating several of the 
possible types of mappings that can be found (a). By incorporating 





the respective inverse for each mapping, leading to two-sided ar- 
rows, the data elements and features can be merged in the sense of 
forming a connected component. Valid datasets with respect to this 
configuration of X and F may only correspond to subsets of the re- 
sulting groups and separated data elements, i.e. {71,72} , £3, £4, £5 
and {x6, x7}. 


ing bijectively associated to fg and x7 with f7. A more 
complete approach would be to consider both possible so- 
lutions and then evaluate the two respectively obtained 
modeling frameworks. 

The above example can also help to further illustrate 
the fact that the current data and feature domains have 
decisive influencing in making specific mappings of data 
elements into respective features. In Figure 22(a), we 
have that x3 has been associated bijectively to fo. Let’s 
now supposed a new data element xg is incorporated into 
X, and that it also maps into the feature instantiation 
fo. Now, f2 it becomes impossible to distinguish between 
x3 and xg, implying the bijective association to be trans- 
ferred to {£3, £8} and fo. 

Now, we have that the merged data elements necessarily 
need to be comprised between the datasets w existing in 
any current environment FE of the M* model, so as to 
maintain the overall consistency and therefore allow the 
bijective mapping between datasets and models. Observe 
that once the features have been associated in the above 
described manner, they become immaterial as far as the 
M* structure and coherence is concerned. However, the 





relationship between features and data elements will need 
to be taken into account while incorporating new data and 
improving the choice of features. 
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Observe that the above method to identify the bijec- 
tive associations between data elements and features can 





be immediately extended to handle bijective associations 
between datasets and features. 

Figure 23 illustrates a possible bijective association be- 
tween three datasets obtained from the data elements in 
Figure 22 and respective models. 





Figure 23: Example of possible bijective association between three 
datasets derived from Figure 22(b) and respective models in the 





modeling framework M, including the respective features also asso- 
ciated bijectively to those datasets. 


Interestingly, observe that the above procedure ulti- 
mately allowed a bijective association to be established 
between subsets of data elements and subsets of features, 
except for the isolated data element and feature instanti- 
ation. 

One point of particular interest here regards the fact 
that an association between data and model being bijec- 
tive depends on the context of the data. This can be easily 
appreciated in terms of the following example. Let’s say 
that we have a specific pair of glasses. In case that object 
is used only inside one’s house (e.g. for reading), there 
is no need to specify this pair of glasses very completely, 





because there are no other similar objects to be distin- 
guished from. This is by no means the case in other sit- 
uations, as when we are in a crowd, which requires many 





more features to be specified. In brief, the bijective map- 
ping of a dataset into a model also depends strongly on 
the respective environment EF. 


14 And How About Clustering? 


It has already been observed in this work that there are 
two types of pattern recognition: supervised and unsu- 
pervised. As only the former has been considered so far 
in our approach, additional considerations can now be 





developed regarding the also important subject of unsu- 
pervised classification, which is also typically known as 
clustering (e.g. [5, 3, 17, 18, 19, 20]). 

Generally speaking, clustering consists basically of find- 
ing separations our groupings between the respective dis- 
tributions of data elements in the adopted feature space, 
as illustrated in Figure 24(a). 





(a) 


(b) 


Figure 24: An example of clustered data containing 3 groups of 
points characterized by being well-delimitated and separated (a), 
as well as another type of data separation which, though not having 
interstitial regions, still ensures all the data elements to be properly 
compartmentalized and classified (b). 


In another related situation, also depicted as the sit- 
uation (b) in Figure 24, the data elements are perfectly 
compartmentalized into respective categories or models, 
even though there are not interstices. The groups are 
also adjacent one another. This type of situation is rarely 
considered in clustering, because of being impossible to 
solve while considering only the spatial distribution of the 
data elements. As a matter of fact, the identification of 


clusters in cases where the datasets of interest are not 





well separated represents a substantial challenge in pat- 
tern recognition, because of the difficulties implied. At 
the same time, several real datasets tend to present over- 
laps and adjacencies, as is the case with the iris dataset 
in Figure 14. 

Figure 24(b) depicts an interesting situation that helps 
to understand how the fact that a subset satisfies a model 
does not necessarily imply or relate to a cluster. 

Here, Q corresponds to a portion of R*, so that the 
possible data elements are ordered pairs (x, y), with x,y € 
R. The whole dataset w delimitated in the figure satisfies 
a well-defined model, namely m : (x, y)|(a > 1) A (a < 
2)A(y > 1)A(y < 2). None of the points outside w belongs 
to this model, so that we also have a perfect partition of 
Q defined in terms of w and [w|]. Yet, despite all these 
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(b) 


Figure 25: An example (a) of dataset w which, though perfectly 
explained by the model m : (x, y)|(x > 1) A(a <2)A (y > 1)A^A(y < 
2), is not a cluster. In (b) we have a dataset w that both is explained 
by this same model, but which is also a cluster. There is not need 
that a dataset corresponding to a model be a cluster, though clusters 
may motivate a dataset to be explained. 


specific and distinct properties of w, it is in absolutely 
no way clustered, or present interstitial regions with the 
remainder of the points in Q. The main effect of the 
interstice is to call our attention on w, motivating the 
explanation in terms of model that was already valid. 

The above examples helps us to realize that both the 
datasets that are well-separated as well as the other types 
of compartmentalized datasets can be explained by mod- 
els. In this sense, a dataset being a cluster is a property 
that would be independent of having a model associated 
to it or not. 

A particularly interesting relationship between cluster- 
ing and modeling relates to the fact that clustering pro- 
vides one of the main means through which delimitated 
datasets can acquire enough interest as to motivate re- 
spective explanation through modeling. The association 
of specific properties of interest (e.g. a well-separated 
group of bacteria capable of digesting some specific ma- 
terial) to a clustered dataset tends to make it even more 
likely to be modeled. It should be also observed that 
such distinguishing properties are primary candidates to 
be adopted as part of the feature for the specific charac- 
terization and modeling. 

Another possibility worth considering is that, once a 
not necessarily well-separated dataset is identified as hav- 
ing special importance, it may become a more isolated 
group as a consequence of identification of more discrimi- 
native features, or even as a consequence of actions moti- 
vated by the need to separate the data, such as performing 
features transformations. For instance, a plant species 
that is initially little different from others, but present 
some interesting property, may be selectively breed to the 
point of being transformed into a more separated cluster. 
It is also possible to contemplate the situation in which 
actions are taken for reducing the cluster separation. 

Yet another possible mechanism leading to the creation 


of clusters is as follows. A single, or a few, data elements 
are observed to present a given property of interest. Ef- 
forts are then invested in identifying more elements sat- 
isfying that property, but without consideration of a con- 
trol counterpart. As a consequence of this biased proce- 
dure, new data elements will be identified that present 





the desired property which will, as a consequence, yield a 
well-separated cluster, because the possible features that 





would possibly imply be adjacent in the selected feature 
having been filtered out. 


15 Malleability of Datasets and 


Models 





Given a structure represented as graph that is subjected 
to topological changes such as inclusion/removal of edges 
or nodes, it is possible to estimate the potential of this 
graph to undergo distinct successive changes. This can be 
done by using the recently introduced malleability mea- 
surement [21]. 





Because the M* framework can ben represented as a 
graph, it becomes interesting to characterize and compare 
the potential of distinct modeling frameworks in terms of 
their respective malleability. 

The main problem when devising means to quantify the 
malleability of a graph or network concerns the fact that 
two or more of these structures (e.g. two separated in- 
stances of a network along time), as identified in terms of 
labels associated to the respective nodes, may actually 
correspond to the same topological structure, differing 
only with respect to the associated labelings, a property 
known as isomorphism, 

As it is often very computationally expensive to be de- 
cide whether two or more graphs are isomorphic, a vi- 
able alternative is to compare those networks after they 
have been mapped into a set of features. Remarkably, 
this mapping does not need to be bijective, provided we 
remain limited to comparing the networks from the per- 
spective of the adopted features, which underlies the ap- 
proach reported in [21]. 

Let y be a graph, and let a specific manner to change 
this network be chosen. At a given time instant t, after 
the application of every possible instance of the considered 
change (e.g. removal of any of the possible edges), a total 
of D distinct networks are found to be derived from the 
initial configuration y, each with a specific probability p;. 
The malleability of this network can be calculated as: 


(11) 


D 2. | 
My =é are pilog2(pi) = e” 


where 77 is the entropy of the probabilities p;. 
It is posited here that this index provides a good way 
to quantify the potential of a modeling framework to be 
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adapted for inclusion of new datasets and/or models. At 
the same time, it also supplies an objective means for 
characterizing the adaptivity and robustness of a given 
modeling framework. 


16 Complexity 


Complexity (e.g. [22, 23, 24, 25, 26, 27]) has remained a 
great challenge to be defined in an ample and yet accurate 
manner. This is all the most remarkable given the great 
importance this concept has achieved not only in scien- 
tific and technological fields, but virtually in all human 
activities. 

Though continuing efforts have been made at grasping 
what complexity means, many of these have been con- 
ceived in order to address relatively specific problems by 
using respective concepts and approaches. A review of 
some of the main approaches to quantifying complexity 
can be found in [28] 

More recently, an attempt has been made at obtain- 
ing a more comprehensive and flexible definition of com- 
plexity that would remain compatible with the way it is 
more generally understood by humans [28]. The underly- 
ing idea is to relate complexity to the costs of developing 
and operating/maintaining a model. Given that the con- 
cept of cost was conceived precisely to adapt to relative 
variations of specific resources availability and demands 
along time and space, the cost of a model seems to pro- 
vide a particularly interesting perspective from which to 
approach the complexity involved in modeling. 





We have already seem that several events and facts con- 
spire to limit the modeling approach, such as described in 





the models reported here (Section 4). Some of the most 
relevant of those are now briefly discussed as indicators 
of complexity. 

First, we have that real-world Q universe sets tend to 





be extremely large, as it is characteristic even for specific 
types of plants and animals. The obtention and main- 
tenance of these large datasets imply not only computa- 
tional expenses, but also curation by experts. 

The fact that the total number of subsets derivable from 
Q corresponds to 202, where N is the number of ele- 
ments in Q, a combinatorial explosion soon takes place 
that makes unfeasible to consider systematic modeling 
approaches taking into account a substantial portion of 





the total possible number of datasets. Therefore, even in 
cases where the individuals are enumerable, and in ab- 
sence of sampling and other types of error and noise, it 
becomes necessary to resource to optimization techniques 
capable of selecting particularly interesting subsets out 
of an extremely large number of possibilities. This im- 
plies substantial development and computational costs, 


and it is poised to result in local minima, all of which 
contributes to making the modeling expenses consider- 
able, accounting probably to many sources of complexity 
typically associated with modeling. 

Then, we have situations in which the original data ele- 
ments are too similar one another, implying a large num- 
ber of features to be derived, some of which will probably 
A related 
problem implying expenses and complexity regards the 


imply in relatively high experimental costs. 


situations in which some of the individuals in E or Q are 
rarely found. 

To the above can be incorporated other several types 
of errors, noise, sampling and other limitations discussed 
in Section 4. We can therefore conclude that modeling at 
a more extensive and accurate level can becom extremely 
expensive and complex in several ways and situations. As 
discussed in Section 4, creativity could be one of the best 
antidotes to complexity, allowing interesting results to be 
obtained even in challenging situations. 


17 Complex Networks 


With a history going back to the beginnings of human- 
ity (e.g. maps), passing through the Königsberg bridges, 
graph theory, and sociological research, the subjects cov- 
ered in the area of network science (e.g. [29, 30]), which 
focuses on complex networks, took off with studies of the 
Internet and the WWW. Briefly speaking, the subject of 
study of this area concerns graphs that present a topol- 
ogy that cannot be described in terms of one or few topo- 





logical measurements such as the node degree (e.g. [31]). 
Therefore, a complex network would tend to present topo- 
logical features markedly distinct from a regular graph or 
a stochastic counterpart such as a uniformly random net- 
work. 

The remarkable success of this area, not only from the 





applied but also theoretical perspectives, resides greatly 
on the ability of graphs to represent virtually every dis- 
crete structure or phenomenon, also allowing for the fact 
that even continuous structures can be discretized to some 
resolution level. Another welcomed aspect highlighted by 
network science consists in its motivation for studies inte- 
erating the topology and dynamics of complex systems. 
Given the above mentioned features of network sci- 
ence, it becomes particularly interesting to discuss, even 
if briefly, the relationship between complex networks and 
the meta modeling framework suggested in the present 
work. This is the subject of he present section which, how- 
ever, shall focus on the issue that complex networks ob- 
tained from real or abstract datasets have been frequently 
derived while considering the similarity between the prop- 
erties of the data to be explained (e.g. [32]). For instance, 
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in an informational network where the nodes stand for 
specific documents such as books, web pages, or works of 
art, the interconnection between these nodes is often per- 
formed while taking into account the content similarity, 
or overlap. Other examples involve networks constructed 
while considering the similarity between the features char- 
acterizing specific entities associated to nodes, such as in 
networks of living species, stars, shapes, etc. The similar- 








ity, or distance, between pairs of node is often represented 
in terms of weights associated to the respective edge. 

A first important point here regards the fact that the 
very act of associating a node to an entity to be rep- 
resented actually corresponds to identifying a model for 
that entity, which needs to be done in terms of a set of 
features. Then, these features can be compared, typically 
through similarity, while interconnecting the nodes. 

While these approaches have great interest and poten- 
tial, as already been demonstrated by the large number 
of well-succeeded applications (e.g. [33]), the often con- 
sidered networks correspond only to one possible manner 
in which entities can be related, more specifically through 
similarities or distances. The framework describe in this 
work seems to allow several possibilities for extending the 
This corresponds, basi- 





network-based representations. 
cally, to employing the several combinations of models 
obtained while deriving a network, involving several types 
of set operations and logical constructions. For instance, 
it is possible to connect two nodes corresponding to re- 
spective datasets with the node associated to the union, 
difference, etc., between these two datasets. Each of these 
connections could be identified by a respective label cor- 
responding to the respectively applied set operation. 
The integration between network science and the M* 
framework can proceed mainly by considering both the 
individual data elements and respective datasets through 
the respective combinations allowed by the exact or ap- 
proximated combination of datasets and models. In ap- 
proximated cases, it may be of particular interest to com- 
pare networks representing an exact modeling framework 
with the available datasets that can be approximately ex- 
plained by each of the theoretical models. Other possi- 
bilities already hinted in this work are to derive bipartite 
networks from the association between data elements and 
respective models, as well as considering the hierarchical 
constructs resulting from combinations of data or models 
as networks, to which similarity links may also be added. 
Though the possibilities are many to be identified and 
discussed here, it is hoped that the above discussion may 
motivate further related analysis and developments. 


18 Collaborative Research 


The M* meta model provides several interesting subsidies 
that can be employed to obtain insights about the char- 
acteristics and challenges in collaborative research. Of 
particular importance here is the requirement in the M* 
metal model of keeping full compatibility and consistence 
between datasets and models. 

Science has largely relied on the integration of two com- 
plementary approaches: individual and collective. In the 
former case, we have a single scientist, possibly with the 
assistance of a team, working on specific problems. ‘The 
latter approach is characterized by more ample collabo- 
rative initiatives involving big projects, regular meetings, 
and, more recently, WWW-based resources. 

While the case of individual research can be directly 





related to the the meta models suggested in the present 
work, the collaborative counterpart requires more analy- 
ses. In fact, it should be observed that a fully individual 
research initiative is virtually impossible, as one needs to 
learn and to communicate concepts and results. 

Figure 26 illustrates a highly simplified situation in- 
volving several agents like that in Figure 1 that can col- 





laborate one another through the represented network of 
information exchange. 



































































































































Figure 26: A highly abstracted and simplified model of interac- 
tion between agents (living beings or machines) that develop mod- 





els. Issues of particular relevance for collaborative research regards 
how well integrated the agents are, which depends on the network 
topology, as well as the importance of keeping consistency between 
communication, datasets and models not only at the individual 
agent level, but also among all the modelers. Two possible ways 
to implement the latter include through shared external resources 


as dabatases, or by implementing continuing exchanges of informa- 





tion between the agents. 


As each researcher gathers new data and develop re- 
spective models, it becomes important to communicate 
these findings as wide as possible through the network, 
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so that the results can be further validated and the other 
modeling frameworks can be updated and kept consis- 
tent. This requires a shared, standardized or translatable 
(e.g. via meta models) body of datasets and models. At 
the same time, it is essential that the same features are 
validated and adopted by all or most researchers. ‘The 
larger the modeling framework, more possibilities can be 
tried while combining and integrating the respective mod- 
els, though adding substantially to the overall complexity. 

Of particular interest becomes the possibility to con- 
struct shared databases integrating all existing datasets 
and respective models, as well as employing automated 
means for implementing or assisting the modeling activ- 
ity. The latter is of critical interest because the typical 
amount of information and knowledge currently necessary 
for many areas by far exceeds the cognitive and memory 
capabilities of any human being. Special attention should 
be also given to establishing common data and model- 
ing formats and representations, especially concerning the 
identification of suitable data structures to be adopted in 
each specific situation or generalized as shared resources. 

Other challenges in collaborative research involves how 
to cope with the several types of modeling errors discussed 
in this work, to which can be added errors and limita- 
tions of the communication through the existing network. 
Data and model curation, possibly assisted by automated 
means, could contribute to achieving reasonable levels of 
data and model quality. As a matter of fact, it is also 
important to continuously keep and expand the commu- 
nicating resources. 


19 Deep Learning 


Similarly to network science (e.g. [29, 30, 31]), deep learn- 
ing (e.g. [34]) has achieved substantial acceptance and 
success in a relatively short period of time, while also rely- 
ing on approaches going back to the 19th century, a great 
deal of which related to the neuronal network paradigm. 

The success of deep learning stems mainly from the fact 
that it paved the way to solving many problems that had 
remained as big challenges for pattern recognition. This 
has been achieved thanks to several factors (e.g. [34]), 
including the consideration of vast amounts of data and 
computing resources, as well as the development of new 
and creative concepts and methods. Most deep learning 
systems have the neuronal elements arranged in a several 
sequential and/or parallel layers containing a vast number 
of components. 

A typical deep learning system can be understood to 
involve a vast number of basic processing elements, anal- 
ogous to neurons (e.g. [35, 36]), that are successively ap- 
plied (often through convolution) onto the incoming data 


in order to derive valuable features and perform successful 
pattern recognition. 

Following the approach reported in this work, it should 
have become evident that the development of formal M/* 
models requires strict maintenance of dataset and model 
consistency not only within themselves, but also one with 
the other, while also involving the ample consideration of 
every data element in the environment F at all times. For 
proper operation of the described modeling approach, ev- 
ery data element and model would need to be considered 
and related along the whole activity of modeling. ‘This 





fact may well be related to the critical importance of tak- 
ing very large training dataset and processing resources 
as it is characteristically found in deep learning. 

Though remarkably successful in many applications, 
deep learning also has its respective challenges, includ- 
ing the relative difficulty oin inferring the rules through 
which the solutions have been obtained, or translating the 
obtained trained parameters into formulations that can be 
more easily communicated to humans. 

Given that the M* and other related approaches de- 
scribed in the present work provide a relatively formal 
description of how datasets can be mapped into models, 
they may pave the way for identifying related methods 
for inferring the learned classification rules and translat- 
ing them into more tangible statements. This could be 
done, for instance, by trying to associate formal or textual 
models to some of the datasets that have been assigned to 





categories by a respective deep learning system and then 
trying to explain other, more complex datasets, in terms 
of logical combinations between the identified datasets. 
The M<°?7* stochastic variation of the M* approach may 
be of particular related interest, as it is directly related 
to the decision regions normally associated to the basic 
deep learning processing elements. 

It may also possible to conceive manners of integrat- 
ing the M* framework within a deep learning system, so 
that the recognition of the input datasets can be directly 
accompanied by the identification of possible respective 
models. Another interesting possibility would be to try 





to adapt the impressive hardware resources developed for 
gaming and deep learning to perform the basic M* oper- 
ations and manipulations in a faster manner. 


20 Creativity 


As complexity and other words that have received great 
attention, creativity has also proven to be a challenge to 
being defined and characterized. Here, we will adopt the 
approach described in [37, 2], more specifically that cre- 
ativity corresponds to manners of achieving effective re- 
sults with relatively great efficiency, little cost, and great 
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innovation. It is from this perspective that we will briefly 
discuss how modeling, especially as approached in the M* 
initiative, is related to creativity. 

One of the main ways in which creativity can be 
achieved consists in seeking for analogies or metaphors 
between two or more problems that, though belonging to 
different areas, present some interesting similarities and 
analogies. An extremely simple illustration of this type 
of creative association is the pairing of numeric sequences 
such as 1,2,3,..., with successive letters as a,b,c,... 

A particularly interesting point is that the M* model- 
ing framework is critically dependent on establishing an 
effective bridge, through a strict bijective association, be- 
tween set operations in the dataset domain and logical 
manipulations in the modeling domain. In addition, by 
providing possible logical and mathematical explanations 
that can be eventually translated into textual statements 
defining a model, the proposed framework contributes 
to making those dataset more accessible and tangible to 





those interested in their respective analysis and modeling. 

In addition, engines analogous to those illustrated in 
this work may be employed as means of providing insights 
about possible relationships between models and datasets, 
as well as clues on how to combine or develop new models 
capable of explaining new datasets. 

Following similar reasonings, the consideration of con- 
cepts and methods from a large number of areas adopted 
in the development of the M* framework also contributes 





to identifying possible creative analogies between their re- 

spective properties, challenges and advantages. 
Additional aspects of the M* approach that may fa- 

vor creativity include the establishment of relationships 





and similarities between the several datasets, which may 
be related to different fields. Thus, the basic set and 
logical operations underlying the M* operation can be 
understood to be directly related to potential creative as- 
sociations between different datasets, areas, features, and 
types of models. The fact that several types of features 
and models may be incorporated into the suggested meta 
models also contribute to provide grounds for creative in- 
vestigation. 

Another not so directly identifiable aspect concerns 
the fact that the M* approach evidences the breathtak- 
ing combinatorial complexity involved in model build- 
ing for relatively large, or even moderate, Q sizes. This 
huge complexity can also understood as contributing more 
space and degrees of freedom while seeking for creative 
approaches and solutions. Interestingly, complexity and 
creativity seems to be in a sense intrinsically connected 





and interdependent, even though they often opposes one 
another. 


21 Concluding Remarks 


And so we have reached the conclusion of the present 
work. It has been a relatively long development, as im- 
plied by its wide main objectives of taking into account 
many concepts and areas as the main subsidy for develop- 
ing a putative meta modeling approach that could provide 
some insights about model building, decision taking, and 
pattern recognition, among other possibilities. 

We started by discussing what we called the informa- 





tional schism that is unavoidably established between the 
real world and any modeling agent, be it a living being 
or a machine. As it has been argued, the appearance of 
these agents was only have been allowed by the creative 
incorporation of modeling abilities capable of providing 
effective means for interacting with the respective envi- 
ronment. This modeling ability is particularly critical 
because it ultimately provides the means for taking ef- 


fective decisions on subsequent actions based on previous 





experiences and the consideration of current environmen- 
tal conditions. 

Because of the central role of model building in so 
many areas, including pattern recognition, it becomes in- 
teresting to develop respective abstract models capable 
of providing some insights and resources for modeling. 
The main requirements that such a meta model should 





have were then identified and listed, including many con- 
straints and sought properties. 

The critically important task of mapping data into 
models was then approached, with special attention be- 
ing focused on the need to preserve as much information 
as possible, which was shown to be only fully possible in 
case an bijective association is established between the 
existing datasets and the respective models. This rela- 
tionships can be achieved by always ensuring that every 
element in a dataset is satisfied by the respective model, 
and vice-versa, therefore implying in a respective bijective 
mapping. It has also been shown that the current dataset 
environment, as well as the choice of features, also con- 
tribute in defining whether a given association between 
dataset and model is bijective or not. 

At the same time, the generalization provided by each 
model in explaining all the elements in the associated 
dataset, which requires a non-injective mapping, was ac- 
commodated in the fact that this non-bijective relation- 
ship is maintained at the level of (dataset, model) asso- 
ciations. ‘The incorporation of parameters and features, 
two important components of model building and pattern 
recognition, was also discussed and addressed. 

Subsequently, we identified and briefly considered some 
of the main sources of limitations in achieving a fully com- 





plete and precise model, which include the presence of 
noise, sampling, and errors. The characteristics and ef- 
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fects of these limitations were then discussed respectively 
to the main components and actions involved in model 
building. 





Having thus obtained subsidies from the presented and 
discussed several points involved in modeling, decision 
making and pattern recognition, we started an approach 
to developing a relatively formal meta modeling frame- 
work, which was called M*. 

This framework provides several interesting features 
that emanate from the strict bijective association estab- 
lished between data and models, including the definition 





of a bridge between these two worlds and the deriva- 
tion of a paired algebra of datasets and models which 
can be employed to find models for new datasets through 
the logical combination of model statements as well as 
in terms of set operations between the existing datasets. 
The M* approach was then illustrated respectively to a 
case-example involving datasets composed of patterns de- 
rived from binary lattices. In fact, despite its seemingly 





strict requirements, there are several problems character- 
ized by relatively small amounts of discrete data that can 
be approached by using the M* framework. 

The requirements underlying the M* approach were 
then progressively relaxed as the modeling approach was 
extended to cope with some of the severe limitations 
that are characteristic of pattern recognition and scien- 
tific modeling, including noise, errors and sampling. This 
led to the M<‘? and MS? meta models, the former ac- 


counting for comparing data in presence of error or sam- 





pling, which was illustrated respectively to elementary 
number theory and verification of hierarchical structures, 
while the latter incorporating means for dealing with fre- 
quently necessary stochastic description of datasets in 
terms of probability densities, subsequently illustrated for 
a database related to the iris flowers. 

Though we have seen that pattern recognition corre- 
sponds to a kind of modeling, it often has the specific 
characteristic in which the condition to be satisfied cor- 
responds to the pertinence of the respective dataset with 
a given category. Oftentimes, but not always, the result 
of pattern recognition does not incorporate a more com- 
plete description of why the respective dataset has been 
decided to be assigned to a certain class. In addition, the 
data environment E tends to be constrained in practical 
pattern recognition problems, while in scientific model- 
ing E is usually assumed to extend as much as possible 





toward the whole physical world. Nevertheless, pattern 
recognition remains an instantiation of model building. 

In addition to providing insights about the intricacies 
of modeling, the suggested frameworks may also be used 
to derive practical methods and software engines for au- 
tomated or assisted scientific modeling and pattern recog- 
nition, among other possibilities. 





At the same time, the presented developments also em- 





phasize the need to keep data and models as much as pos- 
sible consistent and integrated, which poises some specific 
challenges regarding formats, data integrity, validation, 
among other issues. 

The important related subjects of clustering, complex- 
ity, collaborative research, deep learning, and creativity 
were then considered and discussed in terms of several of 
the concepts and insights provided by the reported mod- 
eling framework. 

The many implications and possibilities allowed by the 
presented concepts and methods pave the way to a sub- 
stantial number of possible future developments. These 
include, but are not limited to, further extending the fam- 





ily of models derived from the reference M* framework 
so as to be able to address additional constrains, devel- 
oping effective concepts and methods that can be used 
for implementing kernel expansion in higher dimensional 
feature spaces, design practical engines for application of 
the described models, integrate the latter with and within 
deep learning concepts and implementations, and con- 
sider other important activities that are also related to 
model building (e.g. planning, diagnoses, learning, rec- 
ommendation, etc.). A particularly fundamental remain- 
ing question is if real-world entities have some precisely 





well-defined respective models that are completely inde- 
pendent of humans and, if so, how these models could be 
somehow inferred. 

Going back to the initial problem of how living be- 
ings and other modeling agents including humans and 
machines may have overcome the so-called information 
schism, the concepts and possibilities discussed in Sec- 
tion 18 can be understood to have provide respective in- 
sights. More specifically, the information schism seems 
to have been circumvented by creative modeling frame- 
works which may well be directly related to those de- 
scribed in the present work, especially in the sense of 
providing means for progressive adaptation to the envi- 
ronment demands, and by establishing communications 
and collaborations between the involved modeling agents. 
These insights may also extend to smaller scales, as at 
the molecular or cellular level, such as regarding the on- 
set of multicellular organisms which could be understood 
as a interconnected body of individual specialized agents 
exchanging mass, energy and information. 





All in all, it is felt that the developed modeling ap- 
proach has potential for satisfying at least partially many 
of the requisites listed in Section 2. At the same time, 
it is important to keep in mind that several of the sug- 
gested concepts and models are still subject of further 
formalization, validation, and extensions. Perhaps one of 
the main results of the described developments ultimately 
resides in the fact that while it is interesting to find out 
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if a given dataset belongs to a specific category, it may 
be even more interesting to have a complete objective de- 
scription, in the form of a respective model, providing 
possible explanations of why this takes place. Another 
critical aspect of the M* framework is that, though it can 
be applied even at the level of individual data elements, 
it emphasizes the consideration of collections of elements 
as providing more significant grounds for strong model 
association. 

We conclude by presenting in Figure 27 a graph ab- 
stract illustrating the main concepts addressed in the 
present work, as well as some of their most relevant inter- 


connections. 

Acknowledgments. 

Luciano da F. Costa thanks CNPq (grant 
no. 307085/2018-0) and FAPESP (grant 15/22308- 
2): 

References 
[| L. da F. Costa. Modeling: The human 
approach to science. Researchgate, 2019. 


https://www.researchgate.net/publication/ 
333389500_Modeling_ The_Human_Approach_to_ 
Science_CDT-8. |Online; accessed 1-Oct-2020.]. 
L. da F. Costa. 
ing, predicting, 
gate, 2019. https: //www.researchgate. 
net/publication/348352157_Learning_ 
Understanding Predicting Creating _CDT-52. 
(Online; accessed 2-Sept-2021.]. 


Learning, understand- 


creating Research- 


K. Koutrombas and S. Theodoridis. Pattern Recog- 
nition. Academic Press, 2008. 


L. da F. Costa. 
tern 


Pattern 
Researchgate, 


cognition, pat- 
Dec 2019. 
https://www.researchgate.net/publication/ 
338168835_Pattern_Cognition_Pattern_ 
Recognition_CDT-19. |Online; accessed 29-Feb- 
2020]. 


recognition. 


SAMPLING 


ERRORS 
COMPLEXITY 


CREATIVITY 


COLLABORATION 


DEEP LEARNING 


i 


COMPLETENESS T MODELING E ABSTRACTION 


PATTERN RECOGN. 





DECISION MAKING 


REAL-WORLD 


i 


CAUSALITY 


Figure 27: Graph abstract interrelating the main subjects of this work. The main concepts and areas covered here, as well as some of their 


most important interconnections. ‘The main issues associated the modeling challenges are shown in orange, while some possible concepts 


that can contribute for solving those issues are represented in light blue. 


[5] 


[11] 


[12] 


[13] 





R. O. Duda, P. E. Hart, and D. G. Stork. Pattern 
Classification. Wiley Interscience, 2000. 


S. Russell and P. Norvig. Artificial Intelligence: A 
Modern Approach. Pearson, 2020. 


L. da F. Costa. 
plications of an imaginary concept. 


Real ap- 
https:// 
www.researchgate.net/publication/349947136_ 


Complex numbers: 


Complex_Numbers_Real_Applications_of_an_ 
Imaginary_Concept_CDT-56, 2021. |Online; 


accessed 21-Aug-2021]. 


L. da F. Costa and R. M. C. Cesar Jr. Shape Clas- 
sification and Analysis: Theory and Practice. CRC 
Press, Boca Raton, 2nd edition, 2009. 


Wikipedia. Jaccard index. https://en.wikipedia. 
org/wiki/Jaccard_index. [Online; accessed 10- 


Oct-2021). 


L. da F. Costa. Convolution! Researchgate, 2019. 
https://www.researchgate.net/publication/ 
336601899_Convolution_CDT-14. [Online; accessed 
09-March-2020.]. 


Wikipedia. 
https://en.wikipedia. org/wiki/Kernel_ 


Kernel density estimation. 


density_estimation. [Online; 
2019]. 


accessed 27-July- 


E. O. Brigham. Fast Fourier Transform and its Ap- 
plications. Pearson, 1988. 


J. D. Loudin and H. E Miettinen. A multivariate 
method for comparing n-dimensional distributions. 
In PHYSTAT2008, SLAC, 2003. 


33 


[14] 


[15] 


16 


er) 


[17] 


L. da F. Costa. 
jaccard index. https://www.researchgate. 
net/publication/355381945_Further_ 
Generalizations_of_the_Jaccard_Index, 
(Online; accessed 21-Aug-2021]. 


Further generalizations of the 


2021. 


L. da F. Costa. An introduction to multisets. 
https://www.researchgate.net/publication/ 
355437006_An_Introduction_to_Multisets_ 


CDT-63, 2021. [Online; accessed 21-Aug-2021]. 


J. Serra. Image Analysis and Mathematical Morphol- 
ogy. Academic Press, 1983. 


U. v. Luxburg, R. C. Williamson, and I. Guyon. 
In JMLR: Workshop 
and Conference Proceedings, pages 65-79, 2012. 


Clustering: Science or art? 


C. Hennig. What are the true clusters? Pattern 


Recognition Letters, 64:53-62, 2015. 


L. da F. Costa. 
through an one-dimensional approach. arXiv, 2020. 
https://arxiv.org/abs/2001.02741. [Online; ac- 
cessed 09-March-2020.]. 


Toward generalized clustering 


C. H. Comin, Filipi N. Silva, and L. da F. Costa. 
A framework for evaluating complex networks mea- 
surements. Europhysics Letters, 110:68002, 2015. 


C. H. Comin, F. N. Silva, and L. da F. Costa. 
Malleability of complex networks. J. Stat. Phys., 


52:083203, 2019. 


M. Waldrop. Complexity: The Emerging Science at 
the Edge of Order and Chaos. Simon and Schuster, 
1993. 


|23] 


|24] 


|25] 


|26] 


|27] 


[28] 


|29] 


[30] 


[31] 


[32] 


[33] 


[34] 


[35] 


S. Kauffman. At Home in the Universe: The Search 
for the Laws of Self-Organization and Complexity. 
Oxford University Press, 1996. 


F. Heylighen. What is complexity? http:// 
pespmci.vub.ac.be/COMPLEXI .htm1, 1996. [Online; 
accessed 05-May-2019]. 


L. Lofgren. Complexity of descriptions of systems: A 
foundational study. Intl. J. Gen. Systems, 3:197-214, 
2007. 


Bruce Edmonds. What is complexity? - the philoso- 
phy of complexity per se with application to some ex- 
amples in evolution. http://cogprints.org/357/, 
07 1995. [Online; accessed 05-May-2019]. 


N. Immerman. 
2015. 


Descriptive Complexity. Springer, 


L. da F. Costa. Re- 
searchgate, 2019. https://www.researchgate. 
net/publication/332877069_Quantifying_ 
Complexity_CDT-6. |Online; 
2019.]. 


Quantifying complexity. 


accessed 30-July- 


A.L. Barabasi and Posfai M. Network Science. Cam- 
bridge University Press, 2016. 


M. Newman. Networks: An introduction. Oxford 


University Press, 2010. 


L. da F. Costa. 
https://www.researchgate.net/publication/ 
324312765_What_is_a_Complex_Network_CDT-2, 
2018. |Online; accessed 05-May-2019]. 


What is a complex network? 


C. H. Comin, T. Peron, F. N. Silva, D. R. Amancio, 
F. A. Rodrigues, and L. da F. Costa. Complex sys- 
tems: Features, similarity and connectivity. Physics 
Reports, 861:1—41, 2020. 


L. da F. Costa, O.N. Oliveira Jr., G. Travieso, F.A. 
Rodrigues, P.R. Villas Boas, L. Antiqueira, M.P. 
Viana, and L.E.C. Rocha. 


ing real-world phenomena with complex networks: a 


Analyzing and model- 


survey of applications. Advances in Physics, pages 
329—412, 2011. 


H. F. de Arruda, A. Benatti, C. H. Comin, and 
L. da F. Costa. Re- 
searchgate, 2019. https://www.researchgate. 
net/publication/335798012_Learning_Deep_ 

Learning_CDT-15. |Online; accessed 22-Dec-2019.]. 


Learning deep learning. 


L. da F. Costa. Neurons as pattern recognizers. Re- 
searchgate, 2020. https ://www.researchgate.net/ 
publication/340257021_Neurons_as_Pattern_ 


34 


Recognizers_CDT-25. |Online; accessed 18-Apr- 


2020.]. 


S. Haykin. Neural Networks And Learning Machines. 
McGraw-Hill Education, 9th edition, 2013. 


L. da F. Costa. 
searchgate, 2021. https://www.researchgate. 
net/publication/334477701_Creativity_and_ 
Complexity_CDT-12. [Online; 
2021.]. 


Creativity and complexity. Re- 


accessed 2-Sept- 


